unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library
https://www.union.ai/pandera
MIT License
3.05k stars 281 forks source link

Incorrect validation passes pandera=0.19.0b3 #1606

Closed obiii closed 3 weeks ago

obiii commented 3 weeks ago

Describe the bug Panders: 0.19.0b3 Python: 3.11 polars: 0.20.23

We are using DataFrameModel to perform some data validation. The validation does not work as expected:

import polars as pl
import pandera.polars as pa
from datetime import date

class CaseSchema(pa.DataFrameModel):
    case_id: str = pa.Field(nullable=False, unique=True)
    gdwh_portfolio_id: str = pa.Field(nullable=False, unique=True)

lf = pl.LazyFrame({
    "case_id": ["case1", "case1", None],
    "gdwh_portfolio_id": ["portfolio1", "portfolio2", "portfolio3"]
})

CaseSchema.validate(lf).collect()

even with CaseSchema.validate(lf)

It returns nothing

Observed behaviour

It returns nothing, so assume it passes the validation.

Expected behavior

Validation should fail because ecase_id is not unique and containers None

Additional context

We are utilizing polars.

cosmicBboy commented 2 weeks ago

Using the wrong import. See https://pandera.readthedocs.io/en/latest/polars.html