unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library
https://www.union.ai/pandera
MIT License
3.25k stars 302 forks source link

Dagster polars dataframe_check #1780

Closed ShootingStarD closed 1 month ago

ShootingStarD commented 1 month ago

Question about pandera

Hi, how can we do a dataframe check with a DataFrameModel using the polars backend? Using pandas I would do more or less this where a have to return a series of bool validating each rows

# Your code here, if applicable
class HolidaysDataSchema(pa.DataFrameModel):
    start: Date = pa.Field(
        description="start date of the holiday",
        coerce=True,
    )
    end: Date = pa.Field(
        description="end date of the event",
        coerce=True,
    )

    @pa.dataframe_check
    def end_date_after_start_date(cls, df) -> Series[bool]:
        return df[cls.start] <= df[cls.end]

However using the polars backend, I receive as df a PolarsData containing a lazyframe attribute I am therefore a bit lost

cosmicBboy commented 1 month ago

Check out the docs section on polars custom checks: https://pandera.readthedocs.io/en/stable/polars.html#dataframe-level-checks

ShootingStarD commented 1 month ago

Yes sorry, I didn't looked enough before posting this

Thanks for all the great work