Closed khwilson closed 3 years ago
First off, didn't get to say it in the formal report above, but thank you for putting your time into this wonderful library!
To business, it turns out the error turns out to be simpler. You can reproduce the error for any schema with no annotations. For instance, the following code snippet produces a similar error:
import pandera as pa
class ASchema(pa.SchemaModel):
pass
ASchema.to_schema()
# Traceback (most recent call last):
# File "test.py", line 10, in <module>
# ASchema.to_schema()
# File ".../pandera/model.py", line 150, in to_schema
# cls.__config__ = cls._collect_config()
# File ".../pandera/model.py", line 331, in _collect_config
# base_options = _extract_config_options(config)
# File ".../pandera/model.py", line 97, in _extract_config_options
# for name, value in vars(config).items()
# TypeError: vars() argument must have __dict__ attribute
The issue seems to stem from this line: https://github.com/pandera-dev/pandera/blob/master/pandera/model.py#L126
Apparently, at least in Python 3.8.6, if no annotations are specified in the subclass, cls.__annotations__
is (maybe?) the superclass's annotated fields. But if even a single annotation is specified, the __annotations__
dictionary contains only the annotated fields of the subclass.
As an example, if we put a print(cls.__annotations__)
right before the for loop there, we get the following:
import pandera as pa
class ASchema(pa.SchemaModel):
pass
# {'Config': typing.Type[pandera.model.BaseConfig],
# '__checks__': typing.Dict[str, typing.List[pandera.checks.Check]],
# '__config__': typing.Union[typing.Type[pandera.model.BaseConfig], NoneType],
# '__dataframe_checks__': typing.List[pandera.checks.Check],
# '__fields__': typing.Dict[str, typing.Tuple[pandera.typing.AnnotationInfo, pandera.model_components.FieldInfo]],
# '__schema__': typing.Union[pandera.schemas.DataFrameSchema, NoneType]}
But if we add an annotation we get:
import pandera as pa
class ASchema(pa.SchemaModel):
a: pa.typing.Series[int]
# {'a': pandera.typing.Series[int]}
One possible fix would be to use __new__(cls, name, bases, dct)
instead of __init_subclass__
and explicitly check if any(cls.__annotations__ is base.__annotations__ for base in bases)
. But that seems quite brittle.
Thanks @khwilson for the bug report and concise examples.
You are right, the problem comes from __annotations__
returning its super's annotations (i.e SchemaModel
) if the class doesn't have annotations itself. Config
is then misinterpreted as a missing field and overwritten.
I've just pushed a fix. I simply filter out annotations that starts with "_" and "Config", as it is done when collecting regular fields.
Btw, your primary_key_mixin
is very nice :star_struck:
Excellent! One quick comment on your solution, though: it's not common practice but I assume there will be times people have fields that start with _
. Perhaps at least updating the documentation to reflect that?
And I'm happy to do a PR with the primary_key_mixin if you wanted to add it to the main library. I've been getting some decent mileage out of it. :-)
Actually private annotations have always been ignored since SchemaModel
was introduced. I realized now that it was not to specified in the docs. I'll do that !
I think the primary key would be better as a recipe. It seems very specific for pandera itself, but @cosmicBboy might have another opinion. It would be nice if you could post the template in the discussions. I'm sure many people would find it useful.
One thing is that mypy complains: error: Unsupported dynamic base class "primary_key_mixin"
. We can continue in "discussions" if you don't mind :)
thanks @khwilson for pointing this out!
Btw, your primary_key_mixin is very nice 🤩
+1, love how flexible this syntax is turning out to be :)
Thanks @jeffzi just ✅ your PR.
Thanks, y'all! Happy to start a discussion about the mixin!
fixed by #434
Describe the bug When creating a
SchemaModel
, if it only containsdataframe_check
s then theto_schema
function fails when it tries to takevars(config)
.Code Sample, a copy-pastable example
This fails
With traceback (replaced some directory names for privacy):
Expected behavior
The above should run to completion
Desktop (please complete the following information):
Additional context
If you add a single class-level variable with an annotation of its own it succeeds, e.g.:
However, if you do NOT have an annotation, it still fails: