Schemas in Tuples are not typechecked

unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library

MIT License

3.37k stars 310 forks source link

from typing import Tuple import pandas as pd import pandera as pa from pandera.typing import Index, DataFrame, Series class InputSchema(pa.SchemaModel): year: Series[int] = pa.Field(gt=2000, coerce=True) month: Series[int] = pa.Field(ge=1, le=12, coerce=True) day: Series[int] = pa.Field(ge=0, le=365, coerce=True) class OutputSchema(InputSchema): revenue: Series[float] @pa.check_types def transform(df: DataFrame[InputSchema]) -> Tuple[DataFrame[OutputSchema], int]: return df.assign(revenue_WRONG=100.0), 1 df = pd.DataFrame({ "year": ["2001", "2002", "2003"], "month": ["3", "6", "12"], "day": ["200", "156", "365"], }) transform(df)

hi @cgebbe, this is related to #820 . I'd say this is sort of a bug, but more of an enhancement/feature request (the fact that only single dataframe outputs are supported should be better documented...)

The workaround for this would be to use check_output, which is a little more flexible:

@pa.check_types
@pa.check_output(OutputSchema.to_schema(), 0)  # validate the 0th index of the output, assume __get_item__ support
def transform(df: DataFrame[InputSchema]) -> Tuple[DataFrame[OutputSchema], int]:  # you can keep this for annotation's sake
    return df.assign(revenue_WRONG=100.0), 1

In any case, would welcome a PR to support this use case with @pa.check_types type annotations!

unionai-oss / pandera

Schemas in Tuples are not typechecked #824