Hi, below is a minimal example of the question / possible bug. Is it expected that after adding a dataframe_check in SchemaWithDFCheck that the schema no longer generates valid data according to the field column checks? This example is still somewhat non-deterministic, please let me know if there is a better way than @seed(10) to get reproducible results.
I do get a warning when running this example, but I would have expected the generated data to still be valid.
UserWarning: Dataframe check doesn't have a defined strategy. Falling back to filtering drawn values based on the check definition. This can considerably slow down data-generation.
Versions:
python==3.9
pandas==1.2.5
pandera==0.6.4
import pandera as pa
import pandas as pd
from pandera.typing import Series
from hypothesis import seed
class Schema(pa.SchemaModel):
field: Series[float] = pa.Field(gt=0)
class SchemaWithDFCheck(Schema):
@pa.dataframe_check
def non_empty(self, df: pd.DataFrame) -> bool:
return not df.empty
@seed(10)
def test():
print(Schema.example(size=1))
'''
>>> field
0 4.940656e-324
'''
print(SchemaWithDFCheck.example(size=1))
'''
>>> field
0 0.0
'''
if __name__ == '__main__':
test()
Hi, below is a minimal example of the question / possible bug. Is it expected that after adding a
dataframe_check
inSchemaWithDFCheck
that the schema no longer generates valid data according to thefield
column checks? This example is still somewhat non-deterministic, please let me know if there is a better way than@seed(10)
to get reproducible results.I do get a warning when running this example, but I would have expected the generated data to still be valid.
Versions: