python-desert / desert

Deserialize to objects while staying DRY
https://desert.readthedocs.io
MIT License
157 stars 10 forks source link

Explicit type specification for unions #36

Open altendky opened 4 years ago

altendky commented 4 years ago

Sometimes it would be good to store a type identifier in union fields. This would be especially true for sparsely populated data and unions containing str (described below).

Consider:

import typing

import attr
import desert

@attr.dataclass
class A:
    color: str

@attr.dataclass
class B:
    color: str

@attr.dataclass
class C:
    things: typing.List[typing.Union[A, B]]

We then create some instances:

instances = C(
    things=[
        A(color='red'),
        A(color='green'),
        B(color='blue'),
    ],
)

Theoretically that would serialize to (it doesn't seem to so there will presumably be another issue report):

{"things": [{"color": "red"}, {"color": "green"}, {"color": "blue"}]}

Which leaves us wondering what any of the things are.

On the flip side, if we take an example in the tests where there is a field t.Union[int, str] then adjust it to include a custom class after str such as t.Union[int, str, MyClass] then str will 'always' work and MyClass instances will be serialized to a str such as 'MyClass()' (well, for an attrs class with such a repr() anyways).

tl;dr, marshmallow-union has some pretty significant limitations. I got myself confused and started creating a solution over in marshmallow-polyfield https://github.com/Bachmann1234/marshmallow-polyfield/pull/34 which would introduce an extra layer to the serialization and include a string to designate the type of that element. No more guessing.

{
    "things": [
        {
            "type": "A",
            "value": {
                "color": "red",
            }
        },
        {
            "type": "A",
            "value": {
                "color": "green",
            }
        },
        {
            "type": "B",
            "value": {
                "color": "blue",
            }
        }
    ]
}

Perhaps desert is interested in using marshmallow-polyfield for that feature? Perhaps desert would rather the feature be added to marshmallow-union?

In the mean time I'll try to find time to work my way through reporting and possibly fixing some other issues.

altendky commented 4 years ago

Next funny business to consider. A str will match both str and typing.Sequence[str] hints. Yay!

altendky commented 4 years ago

So I think at some level it's simply going to be possible to have non-distinguishing hints. I guess it's up to the registry (the thing that figures out the hint to use etc, whatever) to decide if it should distinguish or not and even though we ought to provide a good option that'll be entirely configurable. The above could be handled by using an isinstance() check as well to prioritize str. Or just order of registered types (which is what happened implicitly before the latest commit). Or prioritize types over hints... But I think this isn't a structural, it doesn't relate to the interface we need to identify.

altendky commented 4 years ago

So, where are we at and what do we need to decide?

94 presently has a TaggedUnion field. It is configured by four callables. from_object(), from_tag(), from_tagged(), and to_tagged(). Bad incomplete names but... the first two identify a hint/tag/field setting from an object (we need to serialize [1, 2, 3] so we will use the hint typing.List[int] and associated field) and from a tag (the serialized data had "list_of_ints" so we will use typing.List[int] etc). The last two will add and remove the explicit 'layer' to serialized (marshaled anyways) data. 1 and tag "my_integer" to {"type": "my_integer", "value": 1} for example, and back.

I think some above discussion was considering that maybe there should be three callables with an extra intermediate step?

The next layer around TaggedUnion field are the functions externally_tagged_union(), internally_tagged_union(), and adjacently_tagged_union. They fill in the callables for adding/removing the explicit tagging for each corresponding form of explicitness. You still provide them with the value/hint/tag/field translation callable pair.

The tests give each of those tagged union callables with methods from a OrderedIsinstanceFieldRegistry instance. It uses the type of the value and the type hints to decide what field to use for a given value to be serialized. Deserialization is pretty trivial given the explicit specification of the type by an associated string.

It is presently failing due to ['abc', '2', 'mno'] matching both typing.List[str] and typing.Sequence[str]. This seems like a not-normal case that a developer would hit this and if they do and have a rule that they want to use then they can make their own registry. Or perhaps we break the registry down a bit so they can reuse it's logic and add their own just for further distinguishment.

So, where are we at and what do we need to decide? (sorry, couldn't help being a bit WET here :])

altendky commented 3 years ago

I might work on this a bit. I've at least reread the discussion so far. Just mentioning in case anyone has updates on the topic from the past year.