oxan / djangorestframework-dataclasses

Dataclasses serializer for Django REST framework
BSD 3-Clause "New" or "Revised" License
431 stars 28 forks source link

Allow for union discriminators different from class names #87

Closed Philipp-Userlike closed 4 months ago

Philipp-Userlike commented 1 year ago

I work with an application that uses lots of nested JSON structures which are modelled with dataclass unions on our end. Union discriminators are often part of the message payload and cannot be arbitrarily defined. Often they are short lowercase words which cannot be made to class names, since that would violate python conventions.

For example a dataclass could look like this. It is not possible to change the 'type' Literals, since they are part of an API specification.

@dataclasses.dataclass
class Wood:
    species: str
    type: Literal["wood"] = "wood"

@dataclasses.dataclass
class Steel:
    alloy: str
    type: Literal["steel"] = "steel"

@dataclasses.dataclass
class Building:
    material: typing.Union[Wood, Steel]

From what I see, to make those dataclasses work with serializers, I would have to subclass the UnionField, override get_discriminator and explicitly define this on my serializer: This is already pretty verbose and it would get worse when there is further nesting, since I always have to explicitly redefine the nested structure as a serializer.

It would be nice to have a way on the dataclass to define its discriminator. I made an attempt here to add this by having a special attribute on the dataclass, but I am not sure if this is the best solution.

oxan commented 1 year ago

From what I see, to make those dataclasses work with serializers, I would have to subclass the UnionField, override get_discriminator and explicitly define this on my serializer:

Defining your custom field on every serializer is not the only solution here. You can also subclass DataclassSerializer and use that serializer instead (see the documentation):

class MyUnionField(UnionField):
    def get_discriminator(self, tp: type) -> str:
        return getattr(tp, "serializer_discriminant", tp.__name__)

class MyDataclassSerializer(DataclassSerializer):
    serializer_union_field = MyUnionField

    @property
    def serializer_dataclass_field(self):
        return MyDataclassSerializer

If you want this policy to be used for your whole project, you can also monkey-patch the serializer_union_field property of DataclassSerializer itself.

Having said that, I do agree that having a nicer way to specify the discriminant for a type would be good, but I'm also not sure if this is the best solution. I'll give it some thought.

Philipp-Userlike commented 1 year ago

Thanks for the reply! I got this working now with patching, but in general I noticed that the approach of serializing type-information as discriminator only works well when you have full control over the protocol. If you create dataclasses to model an existing protocol, you run into some issues.

I think for that an approach similar to dacite would work better for deserialization of payload into dataclasses: https://github.com/konradhalas/dacite/blob/10a9ec40fc5874ae434aa68b975d1b1bf667a42f/dacite/core.py#L110C28-L110C28 They simply try to initialize every member of the union with the payload value, skipping values where exceptions happen. There is a strict_unions_match mode that allows to make sure only one union member is successfully initialized.

Philipp-Userlike commented 1 year ago

The requirement to have union members be a mapping was also a problem for me, when payloads had unions of different primitive values. I saw the solution with nest_value, but this again requires control over the payload protocol definition.

oxan commented 1 year ago

If you create dataclasses to model an existing protocol, you run into some issues.

That doesn't surprise me, as it's not a common usecase.

They simply try to initialize every member of the union with the payload value, skipping values where exceptions happen.

That sounds quite fragile to me. It doesn't work for values that can initialize more than one union member, and could also cause side-effects from the serializers and constructors of the non-active members. I very much prefer an explicit solution. You can of course always implement your own, custom UnionField that initializes this way.

codebutler commented 1 year ago

Hi, what are the concerns with using Literal?

Here is how I monkey-patched this for reference:

from typing import get_args, get_origin, Literal

_orig_get_discriminator = UnionField.get_discriminator

def _get_discriminator(self, tp: type):
    if self.discriminator_field_name not in tp.__annotations__:
        return _orig_get_discriminator(self, tp)
    literal_hint = tp.__annotations__[self.discriminator_field_name]
    if get_origin(literal_hint) is not Literal:
        raise TypeError(
            f"{tp} has a {self.discriminator_field_name} attribute that is not a Literal"
        )
    return get_args(literal_hint)[0]

UnionField.get_discriminator = _get_discriminator
oxan commented 1 year ago

Hi, what are the concerns with using Literal?

I don't like that it creates an extra field on the dataclass, which mixes actual data with metadata. It also allows you to do nonsensical things such as Steel(type="wood").

The three options I'm currently considering for custom discriminators are:

I'm not convinced of any of them yet, though.