unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library
https://www.union.ai/pandera
MIT License
3.37k stars 310 forks source link

Schemas in Tuples are not typechecked #824

Open cgebbe opened 2 years ago

cgebbe commented 2 years ago

When the script below is executed, Then I would have expected an error that "revenue_WRONG" does not conform to the OutputSchema. However, no error is raised using pandera 0.10.1 and python 3.8.10.

from typing import Tuple
import pandas as pd
import pandera as pa
from pandera.typing import Index, DataFrame, Series

class InputSchema(pa.SchemaModel):
    year: Series[int] = pa.Field(gt=2000, coerce=True)
    month: Series[int] = pa.Field(ge=1, le=12, coerce=True)
    day: Series[int] = pa.Field(ge=0, le=365, coerce=True)

class OutputSchema(InputSchema):
    revenue: Series[float]

@pa.check_types
def transform(df: DataFrame[InputSchema]) -> Tuple[DataFrame[OutputSchema], int]:
    return df.assign(revenue_WRONG=100.0), 1

df = pd.DataFrame({
    "year": ["2001", "2002", "2003"],
    "month": ["3", "6", "12"],
    "day": ["200", "156", "365"],
})

transform(df)
cosmicBboy commented 2 years ago

hi @cgebbe, this is related to #820 . I'd say this is sort of a bug, but more of an enhancement/feature request (the fact that only single dataframe outputs are supported should be better documented...)

The workaround for this would be to use check_output, which is a little more flexible:

@pa.check_types
@pa.check_output(OutputSchema.to_schema(), 0)  # validate the 0th index of the output, assume __get_item__ support
def transform(df: DataFrame[InputSchema]) -> Tuple[DataFrame[OutputSchema], int]:  # you can keep this for annotation's sake
    return df.assign(revenue_WRONG=100.0), 1

In any case, would welcome a PR to support this use case with @pa.check_types type annotations!