seandstewart / typical

Typical: Fast, simple, & correct data-validation using Python 3 typing.
https://python-typical.org
MIT License
183 stars 9 forks source link

Unpredictable behavior with Union[pd.Series, pd.DataFrame] #166

Closed xbanke closed 3 years ago

xbanke commented 3 years ago

Description

The result of the following script is unpredictable:

import typing
from dataclasses import dataclass
import typic

import pandas as pd

@typic.al
@dataclass
class A:
    data: typing.Dict[str, typing.Union[pd.Series, pd.DataFrame]] = typic.field(default_factory=dict)

@typic.al
@dataclass
class Foo:
    a: A

if __name__ == '__main__':
    print('pass')

Sometime, it passed. But not passed every time. It raises unhashable type: 'set' or unhashable type: 'dict' or unhashable type: 'list', sometime. Maybe, it's caused by typing.Union[pd.Series, pd.DataFrame].

test_typical.py 18 <module>
@dataclass

api.py 385 typed
return _typed(_cls_or_callable) if _cls_or_callable is not None else _typed

api.py 377 _typed
return wrap_cls(obj, delay=delay, strict=strict)  # type: ignore

api.py 341 wrap_cls
wrapped: Type[WrappedObjectT] = cls_wrapper(klass)

api.py 339 cls_wrapper
return _resolve_class(cls_, strict=strict, jsonschema=jsonschema, serde=serde)

api.py 212 _resolve_class
protos = protocols(cls, strict=strict)

resolver.py 794 protocols
annotation, name=name, parameter=param, is_strict=strict, namespace=obj

resolver.py 720 resolve
resolved = self._resolve_from_annotation(anno, namespace=namespace)

resolver.py 533 _resolve_from_annotation
anno, constraints, namespace=namespace

des.py 902 factory
deserializer = self._build_des(annotation, key, namespace)

des.py 772 _build_des
needs_return = handler(self, context)

des.py 464 _build_mapping_des
item_type, flags=annotation.serde.flags, namespace=namespace

resolver.py 720 resolve
resolved = self._resolve_from_annotation(anno, namespace=namespace)

resolver.py 533 _resolve_from_annotation
anno, constraints, namespace=namespace

des.py 893 factory
key = self._get_name(annotation, constr)

des.py 177 _get_name
return get_defname("deserializer", (annotation, constr))

<string> 2 __hash__

<string> 2 __hash__

TypeError:
unhashable type: 'set'
seandstewart commented 3 years ago

I was able to isolate the issue (the heuristic for guessing a "tag" for this Union was sometimes extracting fields with mutable values) and got it working, but FYI, a Union of Series and DataFrame won't be able to differentiate the two from the point of view of incoming data which is not one of these types. It will always be coerced to the first type in the Union:

>>> Foo.transmute({"a": {"data": {"d": {}}}})
Foo(a=A(data={'d': Series([], dtype: float64)}))