Open adrien-turiot-maxa opened 2 years ago
Facing the same issue right now. I would like to validate my dataframes right after loading them from csv and then have the proper type annotation from there. Currently I am using a small custom function which calls SchemaModel.validate
and then casts to DataFrame[T]
, but I would actually expect pandera to already return that....
Looking into this... basically need to do the following:
typing.overload
on the following methods:
example
, will need to refactor it so that it takes an optional dataframe_type
argument that determines the generic type... not sure if mypy supports this yet, if not will need to punt on proper typing for example
.Probably for another PR, but will probably also need to overload the DataFrameSchema.validate
method: https://github.com/unionai-oss/pandera/blob/main/pandera/schemas.py#L441-L450
@lorenzo-w would you be open to making a contribution here?
@cosmicBboy Wow thanks! That was the swiftest response I've ever had to a public issue. How could I say no then? 🙃 So yes, I'll take a shot at it this weekend and make a PR if it works.
Great @lorenzo-w ! The issue's been around for a while, so didn't want it to fall through the cracks again. Let me know if you need any help, check out the contribution guide to get your dev environment set up
Also running into this issue and I'm happy to help. Just noting that for now you could also call DataFrame[Schema](existing_df)
for validation and type-checking
The
SchemaModel.validate
function returns aDataFrameBase[T]
, which does not extendpd.DataFrame
.This makes type validations fail whenever a
pd.DataFrame
is expected. For example:Why does
Schema.validate
return aDataFrameBase[T]
instead of aDataFrame[T]
?This is the same for the
SchemaModel.example
function.(pandera version 0.9.0)