unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library
https://www.union.ai/pandera
MIT License
3.37k stars 310 forks source link

Pandera 0.14+ check_types behavior change? #1142

Open kr-hansen opened 1 year ago

kr-hansen commented 1 year ago

When upgrading our library to Pandera > 0.14 the check_types decorator doesn't seem to be checking DataFrames in the same way. It may be by design for some of the changes in 0.14, but it seems like a bug.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

from typing import Tuple

import pandera as pa
from pandera.typing import DataFrame, Series

class TestModel(pa.DataFrameModel):
    my_column: Series[int]

@pa.check_types
def func_to_test(df: DataFrame[TestModel], val: int) -> Tuple[str, str]:
    return type(df), type(val)

func_to_test(df="not_a_dataframe", val=5)

Expected behavior

I would have expected passing "not_a_dataframe" in as a string for df in my test function to raise a validation error for df. However, now the function works without the validation happening. This occurred as expected in my tests for pandera<0.14 but when trying to upgrade it has an issue.

In pandera==0.13.4 the error raised was AttributeError: 'str' object has no attribute 'pandera' which is kind of a hidden bug that was previously being raised. Note that for the 0.13.4 test you also have to change the TestModel definition to be pa.SchemaModel instead of a DataFrameModel.

Desktop (please complete the following information):

cosmicBboy commented 1 year ago

this is definitely a bug!

It's around here: https://github.com/unionai-oss/pandera/blob/main/pandera/decorators.py#L645-L649

Basically we also need to raise an error there, instead of returning the argument. At this point in the execution path already determined whether there's a corresponding SchemaModel for a particular argument.

Mind opening up a PR for this one @kr-hansen ?