madman-bob / python-dataclasses-serialization

Serialize/deserialize Python dataclasses to various other data formats
MIT License
25 stars 11 forks source link

Deserialisation fails when using Mapping, Sequence instead of Dict, List #17

Closed petered closed 1 year ago

petered commented 1 year ago

It's often good practice in data classes to use Mapping and Sequence instead of Dict and List - which indicates that the field is supposed to be immutable - python-dataclasses-serialization currently does not allow for this. See example, in which both tests fail:

from dataclasses import dataclass
from typing import Dict, Sequence, Mapping
from dataclasses_serialization.json import JSONSerializer

@dataclass
class ChildClass:
    age: int = 0

@dataclass
class ParentOfNamedChildren:
    children: Mapping[str, ChildClass]  # Using Dict in place of mapping does work

def test_parent_of_multiple():
    parent = ParentOfNamedChildren(children={
        'bobby': ChildClass(age=4),
        'suzy': ChildClass(age=5),
    })
    serobj = JSONSerializer.serialize(parent)
    obj = JSONSerializer.deserialize(ParentOfNamedChildren, serobj)  # Fails here
    assert obj.children['suzy'].age == 5

@dataclass
class ParentOfOrderedChildren:
    children: Sequence[ChildClass]  # Using List in place of Sequence does work

def test_parent_of_ordered_children():
    parent = ParentOfOrderedChildren(children=[
        ChildClass(age=4),
        ChildClass(age=5),
    ])
    serobj = JSONSerializer.serialize(parent)
    obj = JSONSerializer.deserialize(ParentOfOrderedChildren, serobj)  # Fails here
    assert obj.children[1].age == 5

if __name__ == "__main__":
    test_parent_of_multiple()
    test_parent_of_ordered_children()
petered commented 1 year ago

Also - aside - why is JSONSerializer called JSONSerializer when it has nothing to do with JSON? As far as I can see, all it is doing is converting an object to a primitive data structure. Would PreSerializer be a more fitting name?

madman-bob commented 1 year ago

The default serializers provide the round-trip property. That is, serializing and unserializing should leave you with a copy of the object you started with. With Mapping and other such abstract classes, we lose the information of which Mapping we use on serialization, so round-trip is not possible. While such generality is useful in libraries and algorithms, I would argue that it is a code smell at the boundaries of your program, such as when serializing.

That said, you should be able to add it as a custom serialization rule, should you desire that behaviour.

To your aside - the format produced by JSONSerializer is precisely the structured representation of JSON in Python..

petered commented 1 year ago

Fair enough - though it may still be nice to include a deserialiser that does this non-roundtrip (idempotent?) serialization in the package. It would be useful, as long as it's clearly indicated that the round-trip object is not guaranteed to be identical.

For safety, you could always deserialise to the immutable version of the type (ie Sequence deserialises to Tuple) - I can't see where that would lead to a problem - except maybe in the odd case where a single object is referred to twice in the same data structure - once as a list and once as a tuple, but I'm not sure this library handles multiple-references anyway.

Unfortunately there's no FrozenDict in python (unless https://peps.python.org/pep-0603/ passes...) so not sure what to do about Mapping. Having is just deserialise to dict would not be bad as long as people were aware of it.