python-attrs / cattrs

Composable custom class converters for attrs, dataclasses and friends.
https://catt.rs
MIT License
791 stars 110 forks source link

structuring with `include_subclasses` does not work when the union type is a direct field of a dataclass. #430

Closed aha79 closed 2 months ago

aha79 commented 11 months ago

Description

structuring with include_subclasses does not work properly when the union type is a direct field of a dataclass. When it is indirectly used in the type hint of a field it works.

What I Did

@dataclass
class A:
    x: int

@dataclass
class Derived(A):
    d: A

def ustrat(a: Any, conv) -> Any:
    return cattrs.strategies.configure_tagged_union(a, conv, tag_name="type")

converter = cattrs.Converter()
cattrs.strategies.include_subclasses(A, converter, union_strategy=ustrat)

dic = [{'x': 9, 'd': {'x': 99, 'd': {'x': 999, 'type': 'A'}, 'type': 'Derived'}, 'type': 'Derived'}]

data_out = converter.structure(dic, List[A] )

# >> data_out  = [Derived(x=9, d=A(x=99))]       # WRONG we expect  '[Derived(x=9, d=Derived(x=99, d=A(x=999)))]'
# Note: The reverse (i.e. unstructuring) works as expected.

Now if we (incorrectly) change A to Optional[A] in Derived it does indeed work. Using Optional[A] instead of A does not harm serialization, as None does never occur. It interferes with static type checking though so it is not a good workaround.

@dataclass
class Derived(A):
    d: Optional[A]      # correct would be 'A', this is a work around

data_out = converter.structure(dic, List[A] )

# >> data_out  = [Derived(x=9, d=Derived(x=99, d=A(x=999)))]       # RIGHT (but at the cost of wrong  Derived.d)
Tinche commented 11 months ago

Interesting. So what I think is happening is the hook for Derived gets created before the include_subclasses hook for A gets installed, hence your behavior.

It's somewhat of a chicken and egg problem. The include_subclasses hook for A needs the hook for Derived to exist, and the hook for Derived needs the include_subclasses hook for A to exist.

I wonder if we can be clever and override hooks for children if their type is the parent type. So the hook for Derived.d would use late binding, i.e. be converter.structure(obj, A). (Note that cattrs doesn't do this by default for performance.)

anderssonjohan commented 10 months ago

@Tinche I'm not sure if this is the same problem, but it seems to be an issue with the tagged union. It works fine if I don't use that.

Example:

>>> import functools
>>> import attrs
>>> import cattrs
>>> import cattrs.strategies
>>> @attrs.define()
... class Base:
...     pass
...
>>> @attrs.define()
... class Child(Base):
...     field: Base
...
>>> c = cattrs.GenConverter()
>>> cattrs.strategies.include_subclasses(Base, c)
>>> result = c.structure({"field": {"field": {}}}, Base)
>>> result
Child(field=Child(field=Base()))
>>> c = cattrs.GenConverter()
>>> cattrs.strategies.include_subclasses(Base, c, union_strategy=functools.partial(cattrs.strategies.configure_tagged_union,tag_name="exp_type"))
>>> c.unstructure(Child(field=Child(field=Base())))
{'field': {'field': {'exp_type': 'Base'}, 'exp_type': 'Child'}, 'exp_type': 'Child'}
>>> c.structure({'field': {'field': {'exp_type': 'Base'}, 'exp_type': 'Child'}, 'exp_type': 'Child'}, Base)
Child(field=Base()) # Child is lost!

I started using cattrs and suddenly one class didn't work. It seems the "workaround" making fields optional makes it work, but is not a good solution in the long run.

What I had which worked:

@attrs.define
class Base:
    pass

@attrs.define
class List(Base):
    items: List[Base]

@attrs.define
class Child(Base):
    left: Base
    right: Base

List with Child items works fine, but in Child I have to change type of left and right to Optional[Base] in order to have them structured as subclasses to Base.

An example with the List-class:

>>> c.unstructure(Child(left=List(items=[Base(), Child(left=List(items=[Base()]), right=Base())]),right=Base()))
{'left': {'items': [{'exp_type': 'Base'}, {'left': {'items': [{'exp_type': 'Base'}], 'exp_type': 'List'}, 'right': {'exp_type': 'Base'}, 'exp_type': 'Child'}], 'exp_type': 'List'}, 'right': {'exp_type': 'Base'}, 'exp_type': 'Child'}
>>> d = c.unstructure(Child(left=List(items=[Base(), Child(left=List(items=[Base()]), right=Base())]),right=Base()))
>>> c.structure(d, Base)
Child(left=Base(), right=Base())

Subclasses without tagged union doesn't work for List because it complains about not having any unique fields, which I find strange as well since it clearly has the unique field items: List[Base], but that's another problem.

anderssonjohan commented 10 months ago

@aha79 @Tinche Here's my ugly workaround which let's me skip the workaround using Optional using the example in my previous comment.

>>> import functools
>>> from typing import Dict, Type
>>> import attrs
>>> import cattrs
>>> import cattrs.gen
>>> import cattrs.strategies
>>> @attrs.define()
... class Base:
...     tag_to_cls: Dict[str, Type] = {}
...     def __init_subclass__(cls, **kwargs):
...         super().__init_subclass__(**kwargs)
...         cls.tag_to_cls.update({cls.__name__: cls})
...
>>> @attrs.define()
... class Child(Base):
...     field: Base
...
>>> c = cattrs.GenConverter()
>>> cattrs.strategies.include_subclasses(Base, c, union_strategy=functools.partial(cattrs.strategies.configure_tagged_union,tag_name="exp_type"))
>>> c.structure({'field': {'field': {'exp_type': 'Base'}, 'exp_type': 'Child'}, 'exp_type': 'Child'}, Base)
Child(field=Base())  # same as in my previous comment, child is lost
>>> def structure_hook(c, o, cl):  # cl = Base
...     cl = Base.tag_to_cls.get(o["exp_type"], cl)  # default to Base
...     struct = cattrs.gen.make_dict_structure_fn(cl, c)
...     return struct(o, cl)
>>> c.register_structure_hook(Base, functools.partial(structure_hook, c))
>>> c.structure({'field': {'field': {'exp_type': 'Base'}, 'exp_type': 'Child'}, 'exp_type': 'Child'}, Base)
Child(field=Child(field=Base()))
Tinche commented 10 months ago

@anderssonjohan I spent some time debugging your example (phew, these are kind of hard to debug tbh).

I think it's the same issue. The reason it works if you change the field type to an Optional is because that forces late resolution for the Child.field hook, which is what solves the underlying issue. Definitely need to figure this out for the release after next (the next release is very close and very overdue, so I don't want to bloat it).

Tinche commented 2 months ago

Looks like this was fixed by a different issue. I've added a unit test for the OP case just the same.