unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library
https://www.union.ai/pandera
MIT License
3.28k stars 307 forks source link

`check_types` decorator does not validate dataframe(s) in returned sequence #1794

Open stainbank opened 4 weeks ago

stainbank commented 4 weeks ago

Describe the bug check_types does not attempt to validate a dataframe-containing sequence (e.g. tuple) returned by the decorated function. This limitation is undocumented, and so the silent approval of dataframes that should fail validation is surprising and insidious.

Code Sample, a copy-pastable example

import pandas as pd
import pandera as pa
from pandera.typing import DataFrame

class Schema(pa.DataFrameModel):
    foo: int

@pa.check_types
def set_foo(df: DataFrame[Schema]) -> tuple[DataFrame[Schema], DataFrame[Schema]]:
    foo = df.assign(foo="foo")
    return foo, foo

set_foo(pd.DataFrame({"foo": [1]}))

Expected behavior

A SchemaError (i.e. error in check_types decorator of function 'set_foo': expected series 'foo' to have type int64, got object) should be raised for one or both returned dataframes. Any element in the return value which is not a dataframe should be ignored (assuming return annotation is correct).

Alternatively, this behaviour should be documented.

Desktop (please complete the following information):

Nick-Seinsche commented 1 week ago

Hi @stainbank, this behaviour is documented Unbenadwadannt

cosmicBboy commented 6 days ago

This is actually a bug. Type annotations like tuple[DataFrame[Schema], DataFrame[Schema]] or list[DataFrame[Schema], DataFrame[Schema]] should be supported... it's just not implemented in the check_types decorator. Currently it only checks for Union types:

https://github.com/unionai-oss/pandera/blob/main/pandera/decorators.py#L606