lidatong / dataclasses-json

Easily serialize Data Classes to and from JSON
MIT License
1.34k stars 150 forks source link

[BUG] Nested tuple with union of tuple does not deserialize correctly #502

Open Edgeworth opened 7 months ago

Edgeworth commented 7 months ago

Description

With this code:

from dataclasses import dataclass

from dataclasses_json import Undefined, dataclass_json

@dataclass_json(undefined=Undefined.RAISE)
@dataclass(eq=True, kw_only=True, order=True, frozen=True)
class TestInput1:
    cols: tuple[(tuple[str, str] | str), ...] = ()

print(TestInput1.from_json('{"cols": [["a", "b"], "c"]}'))

@dataclass_json(undefined=Undefined.RAISE)
@dataclass(eq=True, kw_only=True, order=True, frozen=True)
class TestInput2:
    cols: tuple[tuple[str, str], ...] = ()

print(TestInput2.from_json('{"cols": [["a", "b"], ["c", "d"]]}'))

The output should contain only tuples. However, with the tuple[(tuple[str, str] | str), ...] type, the inner tuple is actually converted as a list. Output:

TestInput1(cols=(['a', 'b'], 'c'))
TestInput2(cols=(('a', 'b'), ('c', 'd')))

Describe the results you expected

TestInput1(cols=(('a', 'b'), 'c'))
TestInput2(cols=(('a', 'b'), ('c', 'd')))

Python version you are using

Python 3.11.5

Environment description

dataclasses-json==0.6.2

george-zubrienko commented 7 months ago

Will take a look soon, sorry bit slow on our side here. Tough times.

Edgeworth commented 7 months ago

No worries, thank you for your work.

george-zubrienko commented 5 months ago

Apparently this is how python converts a compatible collection into tuples (3.11):

> tuple(["a", "b"])
('a', 'b')
> tuple([["a", "b"], "b"])
(['a', 'b'], 'b')

So the library in general will get into trouble with nested generics like this one. It can technically be fixed by traversing the type args, but I fear things might go wrong there. Currently we do this:

        try:
            materialize_type = _get_type_cons(type_)
        except (TypeError, AttributeError):
            pass
        res = materialize_type(xs)

where type_ is the field type hint. Fixing this issue would require some effort and I idk about v0 if that's worth it. Fun fact:

@dataclass_json(undefined=Undefined.RAISE)
@dataclass(eq=True, kw_only=True, order=True, frozen=True)
class TestInput1:
    cols: tuple[str, ...] = ()

gives TestInput1(cols=(['a', 'b'], 'c')) as well which seems like a blown up coercion :D

george-zubrienko commented 5 months ago

le me know your thoughts @Edgeworth

Edgeworth commented 5 months ago

It looks to me like the library currently works for nested tuples, see the OP

print(TestInput2.from_json('{"cols": [["a", "b"], ["c", "d"]]}'))

gives TestInput2(cols=(('a', 'b'), ('c', 'd'))), but the union case gives it trouble. Not sure if that is related to the tuple([["a", "b"], "b"]) behaviour (just dunno).

For my personal use case, I worked around this. But I think it's reasonable to expect cols: tuple[(tuple[str, str] | str), ...] to deserialize into something that matches the type hint (and works with hashable etc dataclasses)

TestInput1(cols=(['a', 'b'], 'c')) does seem very weird also lmao

w.r.t. whether to implement this now or in the new version, personally I don't mind. Since I have a workaround, I think it's reasonable to leave it unless there are other people who need this to work.

WDYT?

george-zubrienko commented 5 months ago

It is because DCJ relies on outer type only. So when you have coherent types inside (only tuples), it will work fine. But when there is a variation - stuff will look ugly. I'll think if there is an easy way to fix this w/o blowing the rest of the lib

jfloyd1697 commented 3 months ago

This works

@dataclass_json(undefined=Undefined.RAISE)
@dataclass(eq=True, kw_only=True, order=True, frozen=True)
class TestInput3:
    cols: tuple[(tuple[str, str] | str), ...] = ()
    _: InitVar[bool] = False

    def __post_init__(self, _: bool = False):
        if not _:
            cols = tuple(tuple(x) if not isinstance(x, str) else x for x in self.cols)
            config = dict(cols=cols, _=True)
            self.__init__(**config)

print(TestInput3.from_json('{"cols": [["a", "b"], "c"]}'))