Closed g-simmons closed 1 year ago
hi @g-simmons, this looks like a performance issue on the dataframe strategy generation function. I suspect it has something to do with this: https://github.com/pandera-dev/pandera/blob/master/pandera/strategies.py#L1104-L1112
for col_name, col_dtype in col_dtypes.items():
if col_dtype in {"object", "str"} or col_dtype.startswith(
"string"
):
# pylint: disable=cell-var-from-loop,undefined-loop-variable
strategy = strategy.map(
lambda df: df.assign(**{col_name: df[col_name].map(str)})
)
It would be better to collect the string columns and then apply a list of columns in strategy.map
:
col_names = []
for col_name, col_dtype in col_dtypes.items():
if col_dtype in {"object", "str"} or col_dtype.startswith(
"string"
):
col_names.append(col_name)
strategy = strategy.map(
lambda df: df.assign(**{col_name: df[col_name].map(str) for col_name in col_names})
)
I don't have the bandwidth to tackle this right now, but please feel free to make a PR for this! (also adding the "help wanted" tag)
@cosmicBboy Great, thanks for the input! I also probably don't have bandwidth to work on it now but will come back later if I do. Thanks!
this was fixed by https://github.com/unionai-oss/pandera/pull/989
Describe the bug If a SchemaModel contains more than 38 fields, SchemaModel.to_schema().example() throws an error:
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
field39
andfield40
are commented outExpected behavior
Don't throw an error, generate an example for the SchemaModel.
Desktop (please complete the following information):