Closed Philipp-Userlike closed 4 months ago
From what I see, to make those dataclasses work with serializers, I would have to subclass the UnionField, override
get_discriminator
and explicitly define this on my serializer:
Defining your custom field on every serializer is not the only solution here. You can also subclass DataclassSerializer
and use that serializer instead (see the documentation):
class MyUnionField(UnionField):
def get_discriminator(self, tp: type) -> str:
return getattr(tp, "serializer_discriminant", tp.__name__)
class MyDataclassSerializer(DataclassSerializer):
serializer_union_field = MyUnionField
@property
def serializer_dataclass_field(self):
return MyDataclassSerializer
If you want this policy to be used for your whole project, you can also monkey-patch the serializer_union_field
property of DataclassSerializer
itself.
Having said that, I do agree that having a nicer way to specify the discriminant for a type would be good, but I'm also not sure if this is the best solution. I'll give it some thought.
Thanks for the reply! I got this working now with patching, but in general I noticed that the approach of serializing type-information as discriminator only works well when you have full control over the protocol. If you create dataclasses to model an existing protocol, you run into some issues.
I think for that an approach similar to dacite would work better for deserialization of payload into dataclasses:
https://github.com/konradhalas/dacite/blob/10a9ec40fc5874ae434aa68b975d1b1bf667a42f/dacite/core.py#L110C28-L110C28
They simply try to initialize every member of the union with the payload value, skipping values where exceptions happen.
There is a strict_unions_match
mode that allows to make sure only one union member is successfully initialized.
The requirement to have union members be a mapping was also a problem for me, when payloads had unions of different primitive values.
I saw the solution with nest_value
, but this again requires control over the payload protocol definition.
If you create dataclasses to model an existing protocol, you run into some issues.
That doesn't surprise me, as it's not a common usecase.
They simply try to initialize every member of the union with the payload value, skipping values where exceptions happen.
That sounds quite fragile to me. It doesn't work for values that can initialize more than one union member, and could also cause side-effects from the serializers and constructors of the non-active members. I very much prefer an explicit solution.
You can of course always implement your own, custom UnionField
that initializes this way.
Hi, what are the concerns with using Literal
?
Here is how I monkey-patched this for reference:
from typing import get_args, get_origin, Literal
_orig_get_discriminator = UnionField.get_discriminator
def _get_discriminator(self, tp: type):
if self.discriminator_field_name not in tp.__annotations__:
return _orig_get_discriminator(self, tp)
literal_hint = tp.__annotations__[self.discriminator_field_name]
if get_origin(literal_hint) is not Literal:
raise TypeError(
f"{tp} has a {self.discriminator_field_name} attribute that is not a Literal"
)
return get_args(literal_hint)[0]
UnionField.get_discriminator = _get_discriminator
Hi, what are the concerns with using
Literal
?
I don't like that it creates an extra field on the dataclass, which mixes actual data with metadata. It also allows you to do nonsensical things such as Steel(type="wood")
.
The three options I'm currently considering for custom discriminators are:
@dataclass
@discriminator('my_dog')
class Dog:
# ...
obj: Dog | Cat = dataclasses.field(metadata={'serializer_kwargs': {'discriminator_values': {Dog: 'dog', Cat: 'cat'}}})
typing.Annotated
. Cons: somewhat clumsy syntax.
obj: Annotated[Dog, Discriminator('dog')] | Annotated[Cat, Discriminator('cat')]
I'm not convinced of any of them yet, though.
I work with an application that uses lots of nested JSON structures which are modelled with dataclass unions on our end. Union discriminators are often part of the message payload and cannot be arbitrarily defined. Often they are short lowercase words which cannot be made to class names, since that would violate python conventions.
For example a dataclass could look like this. It is not possible to change the 'type' Literals, since they are part of an API specification.
From what I see, to make those dataclasses work with serializers, I would have to subclass the UnionField, override
get_discriminator
and explicitly define this on my serializer: This is already pretty verbose and it would get worse when there is further nesting, since I always have to explicitly redefine the nested structure as a serializer.It would be nice to have a way on the dataclass to define its discriminator. I made an attempt here to add this by having a special attribute on the dataclass, but I am not sure if this is the best solution.