unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library
https://www.union.ai/pandera
MIT License
3.27k stars 305 forks source link

Bug: mypy complains about datetime.date / pandera.Date type #954

Closed blais closed 1 year ago

blais commented 1 year ago

Hi! I tried the following to represent a column of datetime.date objects (with no time):

  expiration_date: pt.Series[datetime.date] = ...

and

  expiration_date: pt.Series[pandas_engine.Date] = ...

Either raises an error "Value of type variable "GenericDtype" of "Series" cannot be "...". I looked in the tests, it's still unclear to me how to do this.

Using a DataframeSchema I was able to set dtype=datetime.date and it appears to work, but what about with SchemaModel? What's the right declaration?

Thank you,

blais commented 1 year ago

Why can't I use this type? https://github.com/unionai-oss/pandera/blob/master/pandera/engines/pandas_engine.py#L838

I was just about to set to writing a custom type - I still haven't been able to get this working - and this looks like what I'd be do if I followed these instructions: https://pandera.readthedocs.io/en/stable/dtypes.html

Yet I'm unable to use it in a Series. Is it just missing from the GenericDType list in pandera.typing.common?

blais commented 1 year ago

I just found an ugly workaround: using pandera.typing.Object and a custom check:

@pandera.extensions.register_check_method()
def is_type(pandas_obj, *, dtype):
    return pandas_obj.map(lambda x: isinstance(x, dtype))

class MySchema(pa.SchemaModel):
  expiration_date: Series[pandera.typing.Object] = pa.Field(.... , is_type={"dtype": datetime.date})

I think this is a kludge, but seems to work. Any insight appreciated,

cosmicBboy commented 1 year ago

this is weird, looking into this...

cosmicBboy commented 1 year ago

What version of pandera are you using, and can you provide a minimally repro example?

Not able to reproduce this:

import datetime

import pandas as pd
import pandera as pa
from pandera.typing import Series

class Schema(pa.SchemaModel):
    item: Series[str] = pa.Field(isin=["apple", "orange"], coerce=True)
    price: Series[float] = pa.Field(gt=0, coerce=True)
    expiry: Series[datetime.date] = pa.Field(coerce=True)

valid_data = pd.DataFrame.from_records([
    {"item": "apple", "price": 0.5, "expiry": datetime.date.today()},
    {"item": "orange", "price": 0.75, "expiry": datetime.date.today()},
])

print(Schema.validate(valid_data))
#      item  price      expiry
# 0   apple   0.50  2022-10-06
# 1  orange   0.75  2022-10-06
blais commented 1 year ago

Nice! That's what I want. I've been using 0.12.0. I'll try upgrading and report back here. Thank you,

blais commented 1 year ago

Hmm, even in 0.12.0 your example works. I'm going to try to isolate why it's not working in my unittest setup.

blais commented 1 year ago

Oh! I see now what's going on. My environment is running "mypy" to check types and it barfs this error:

File "_schemas.py", line 48, characters 14-14:                                                                                                       
error: Value of type variable "GenericDtype" of "Series" cannot be "date"                                                                            

So I can disable mypy for this section of my code, but I think you may want to add that type to the GenericDType. Here: https://github.com/unionai-oss/pandera/blob/master/pandera/typing/common.py#L91

Thank you cosmicBBoy!

cosmicBboy commented 1 year ago

ah, nice!

So I can disable mypy for this section of my code, but I think you may want to add that type to the GenericDType. Here: https://github.com/unionai-oss/pandera/blob/master/pandera/typing/common.py#L91

Feel free to make a PR for this! Will rename this issue to reflect the bug