Closed philiporlando closed 1 month ago
See the docs here https://pandera.readthedocs.io/en/latest/polars.html#error-reporting
This is intended behavior: LazyFrame validation will only to schema-level checks (so as not to materialize the data in a lazy method chain). Currently, pandera assumes that all custom checks operate on data. You can force data-level checks by explicitly setting export PANDERA_VALIDATION_ENABLED=SCHEMA_AND_DATA
.
Is this a duplicate of #1565?
See the docs here https://pandera.readthedocs.io/en/latest/polars.html#error-reporting
This is intended behavior: LazyFrame validation will only to schema-level checks (so as not to materialize the data in a lazy method chain). Currently, pandera assumes that all custom checks operate on data. You can force data-level checks by explicitly setting
export PANDERA_VALIDATION_ENABLED=SCHEMA_AND_DATA
.
This is super helpful and makes total sense. Thanks for the feedback.
Is this a duplicate of #1565?
I don't think so. The error that I'm experiencing in #1565 is specific to pl.DataFrame
.
Gotcha, yeah looks like a bug, looking.
@philiporlando would it make sense to add some logging at validation time to explicitly say what types of checks are being run? If so, would it make sense as logging.info
, debug
or something else?
@philiporlando would it make sense to add some logging at validation time to explicitly say what types of checks are being run? If so, would it make sense as
logging.info
,debug
or something else?
I'm in favor of this! At the very least, I think it would be helpful to communicate which data-level checks are ignored whenever a LazyFrame is validated instead of a DataFrame. It might even make sense to log a warning here?
Gotcha, yeah looks like a bug, looking.
Thank you for looking into it!
Code Sample, a copy-pastable example
I've created a custom check function that should never return
True
based on my sample data. However, pandera does not raise an error when validating the fruit column. This may be related to #1565.Converting from
LazyFrame
toDataFrame
before performing the schema validation appears to raise the expected error:Expected behavior
I would expect to see a schema validation error raised with the
LazyFrame
here since none of the fruit values have a string length of 20 characters.Desktop (please complete the following information):
pandera==0.19.0b1