Mypy - SchemaModel.validate does not return a DataFrame

unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library

https://www.union.ai/pandera

MIT License

3.27k stars 305 forks source link

Mypy - SchemaModel.validate does not return a DataFrame #763

Open adrien-turiot-maxa opened 2 years ago

adrien-turiot-maxa commented 2 years ago

The SchemaModel.validate function returns a DataFrameBase[T], which does not extend pd.DataFrame.

This makes type validations fail whenever a pd.DataFrame is expected. For example:

import pandera as pa
from pandera.typing import Series

class Schema(pa.SchemaModel):
    col1: Series[float]
    col2: Series[float]

existing_df = pd.DataFrame({"col1": [1, 2, 3], "col2": [1, 2, 3]})
result = Schema.validate(existing_df)

result.to_csv("test")        # mypy error: "DataFrameBase[Schema]" has no attribute "to_csv"
pd.concat([result, result])  # mypy error: List item has incompatible type "DataFrameBase[Schema]"

Why does Schema.validate return a DataFrameBase[T] instead of a DataFrame[T] ?

This is the same for the SchemaModel.example function.

(pandera version 0.9.0)

lorenzo-w commented 1 year ago

Facing the same issue right now. I would like to validate my dataframes right after loading them from csv and then have the proper type annotation from there. Currently I am using a small custom function which calls SchemaModel.validate and then casts to DataFrame[T], but I would actually expect pandera to already return that....

cosmicBboy commented 1 year ago

Looking into this... basically need to do the following:

use typing.overload on the following methods:
For example, will need to refactor it so that it takes an optional dataframe_type argument that determines the generic type... not sure if mypy supports this yet, if not will need to punt on proper typing for example.

Probably for another PR, but will probably also need to overload the DataFrameSchema.validate method: https://github.com/unionai-oss/pandera/blob/main/pandera/schemas.py#L441-L450

@lorenzo-w would you be open to making a contribution here?

lorenzo-w commented 1 year ago

@cosmicBboy Wow thanks! That was the swiftest response I've ever had to a public issue. How could I say no then? 🙃 So yes, I'll take a shot at it this weekend and make a PR if it works.

cosmicBboy commented 1 year ago

Great @lorenzo-w ! The issue's been around for a while, so didn't want it to fall through the cracks again. Let me know if you need any help, check out the contribution guide to get your dev environment set up

adzcai commented 2 months ago

Also running into this issue and I'm happy to help. Just noting that for now you could also call DataFrame[Schema](existing_df) for validation and type-checking