unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library
https://www.union.ai/pandera
MIT License
3.42k stars 311 forks source link

Better integration with mypy for static type checking #1822

Open Idanaviv66 opened 2 months ago

Idanaviv66 commented 2 months ago

Is your feature request related to a problem? Please describe. Today using pendra does not give static type checking. For example when I have a row set to str, but then try to modify it to int, mypy don't throw an error. Here is example code:

import pandas as pd
from pandera import DataFrameModel, Field, check_types
from pandera.typing import DataFrame

class Fruit(DataFrameModel):
    color: str
    weight: float = Field(gt=0)

@check_types(lazy=True)
def make_red(fruit_df: DataFrame[Fruit]) -> DataFrame[Fruit]:
    fruit_df = DataFrame[Fruit](
        fruit_df.copy()
    )  ## need to do it because .copy return dataframe and not DataFrame[Fruit]
    fruit_df[Fruit.color] = 1 ## mypy should throw an error because the type is different 
    return fruit_df

df = pd.DataFrame(
    {
        "color": ["red", "blue"],
        "weight": [1.3, 1.4],
    }
)
validated_df = DataFrame[Fruit](df)
make_red(validated_df )

Here the make_red funciton is changing the value from str to int.

Describe the solution you'd like Can use mypy to detect such a problem when setting value that should be str or any other type, to a different type.

cosmicBboy commented 2 months ago

hi @Idanaviv66 I'm all for this, but unfortunately this isn't really possible today, see this docs page for more of an explanation.

An important caveat to static type-linting with pandera dataframe types is that, since pandas dataframes are mutable objects, there’s no way for mypy to know whether a mutated instance of a DataFrameModel-typed dataframe has the correct contents. Fortunately, we can simply rely on the check_types() decorator to verify that the output dataframe is valid.

If you can figure out a way to do this, please feel free to make a PR!