Closed blais closed 1 year ago
Why can't I use this type? https://github.com/unionai-oss/pandera/blob/master/pandera/engines/pandas_engine.py#L838
I was just about to set to writing a custom type - I still haven't been able to get this working - and this looks like what I'd be do if I followed these instructions: https://pandera.readthedocs.io/en/stable/dtypes.html
Yet I'm unable to use it in a Series. Is it just missing from the GenericDType list in pandera.typing.common?
I just found an ugly workaround: using pandera.typing.Object
and a custom check:
@pandera.extensions.register_check_method()
def is_type(pandas_obj, *, dtype):
return pandas_obj.map(lambda x: isinstance(x, dtype))
class MySchema(pa.SchemaModel):
expiration_date: Series[pandera.typing.Object] = pa.Field(.... , is_type={"dtype": datetime.date})
I think this is a kludge, but seems to work. Any insight appreciated,
this is weird, looking into this...
What version of pandera are you using, and can you provide a minimally repro example?
Not able to reproduce this:
import datetime
import pandas as pd
import pandera as pa
from pandera.typing import Series
class Schema(pa.SchemaModel):
item: Series[str] = pa.Field(isin=["apple", "orange"], coerce=True)
price: Series[float] = pa.Field(gt=0, coerce=True)
expiry: Series[datetime.date] = pa.Field(coerce=True)
valid_data = pd.DataFrame.from_records([
{"item": "apple", "price": 0.5, "expiry": datetime.date.today()},
{"item": "orange", "price": 0.75, "expiry": datetime.date.today()},
])
print(Schema.validate(valid_data))
# item price expiry
# 0 apple 0.50 2022-10-06
# 1 orange 0.75 2022-10-06
Nice! That's what I want. I've been using 0.12.0. I'll try upgrading and report back here. Thank you,
Hmm, even in 0.12.0 your example works. I'm going to try to isolate why it's not working in my unittest setup.
Oh! I see now what's going on. My environment is running "mypy" to check types and it barfs this error:
File "_schemas.py", line 48, characters 14-14:
error: Value of type variable "GenericDtype" of "Series" cannot be "date"
So I can disable mypy for this section of my code, but I think you may want to add that type to the GenericDType. Here: https://github.com/unionai-oss/pandera/blob/master/pandera/typing/common.py#L91
Thank you cosmicBBoy!
ah, nice!
So I can disable mypy for this section of my code, but I think you may want to add that type to the GenericDType. Here: https://github.com/unionai-oss/pandera/blob/master/pandera/typing/common.py#L91
Feel free to make a PR for this! Will rename this issue to reflect the bug
Hi! I tried the following to represent a column of
datetime.date
objects (with no time):and
Either raises an error "Value of type variable "GenericDtype" of "Series" cannot be "...". I looked in the tests, it's still unclear to me how to do this.
Using a DataframeSchema I was able to set
dtype=datetime.date
and it appears to work, but what about with SchemaModel? What's the right declaration?Thank you,