python-attrs / cattrs

Composable custom class converters for attrs, dataclasses and friends.
https://catt.rs
MIT License
829 stars 113 forks source link

[`help`] Understanding structuring hooks #567

Closed dhirschfeld closed 3 months ago

dhirschfeld commented 4 months ago

Description

I'm trying to understand if cattrs is a good fit for my problem. I think it might be, but I'm not sure how to actually use it so would greatly appreciate some advice from an expert!

I have a python dictionary which represents partially deserialized data - e.g.

decoded = {
    'seed': 42,
    'method': 'test',
    'effective_date': {'_type': 'datetime', 'value': '2024-08-01T23:18:45.674566'},
    'mytuple': {'_type': 'MyTuple', 'value': [1, 2, 3]},
}

i.e. primitive types have been deserialized into the corresponding Python objects but complex types are embedded in the output, represented by a tagged dict - {"_type": str, "value": Any}

Details ```python >>> import json >>> from dataclasses import dataclass >>> from datetime import datetime >>> import cattrs >>> >>> @dataclass ... class MyTuple: ... values: tuple[int] ... >>> kwargs = dict(seed=42, method='test', effective_date=datetime.now(), mytuple=MyTuple(values=(1,2,3))) >>> kwargs {'seed': 42, 'method': 'test', 'effective_date': datetime.datetime(2024, 8, 1, 23, 18, 45, 674566), 'mytuple': MyTuple(values=(1, 2, 3))} >>> def encode_datetime(obj: datetime) -> dict: ... return dict(_type='datetime', value=obj.isoformat()) ... >>> def encode_mytuple(obj: MyTuple) -> dict: ... return dict(_type='MyTuple', value=obj.values) ... >>> def default(obj): ... if isinstance(obj, datetime): ... return encode_datetime(obj) ... elif isinstance(obj, MyTuple): ... return encode_mytuple(obj) ... else: ... return obj ... >>> encoded = json.dumps(kwargs, default=default) >>> encoded '{"seed": 42, "method": "test", "effective_date": {"_type": "datetime", "value": "2024-08-01T23:18:45.674566"}, "mytuple": {"_type": "MyTuple", "value": [1, 2, 3]}}' >>> decoded = json.loads(encoded) >>> decoded {'seed': 42, 'method': 'test', 'effective_date': {'_type': 'datetime', 'value': '2024-08-01T23:18:45.674566'}, 'mytuple': {'_type': 'MyTuple', 'value': [1, 2, 3]}} ```

I think it should be able to use cattrs to convert the tagged dicts back into the Python objects they represent but I'm not sure how to define the structure hook to do so?

>>> converter = cattrs.Converter()
>>> converter.structure(decoded, int | str | datetime | MyTuple)
Traceback (most recent call last):
  Cell In[15], line 1
    converter.structure(decoded, int | str | datetime | MyTuple)
  File /opt/python/envs/dev310/lib/python3.10/site-packages/cattrs/converters.py:332 in structure
    return self._structure_func.dispatch(cl)(obj, cl)
  File /opt/python/envs/dev310/lib/python3.10/site-packages/cattrs/fns.py:17 in raise_error
    raise StructureHandlerNotFoundError(msg, type_=cl)
StructureHandlerNotFoundError: Unsupported type: int | str | datetime.datetime | __main__.MyTuple. Register a structure hook for it.

Defining the decoders is straightforward:

>>> def decode_mytuple(arg: dict) -> tuple:
...     return MyTuple(values=tuple(arg['value']))
... 
>>> def decode_datetime(arg: dict) -> datetime:
...     return datetime.fromisoformat(arg['value'])
...
>>> decode_datetime({'_type': 'datetime', 'value': '2024-08-01T23:18:45.674566'})
datetime.datetime(2024, 8, 1, 23, 18, 45, 674566)

>>> decode_mytuple({'_type': 'MyTuple', 'value': [1, 2, 3]})
MyTuple(values=(1, 2, 3))

...but how can I tell cattrs to e.g. apply decode_mytuple when it encounters a nested dict with a _type key which equals MyTuple?

salotz commented 4 months ago

Without providing a comprehensive answer, I'll just say that I have a system implemented very similar to what you are proposing. I actually have one converter which requires the type tags (even without unions) and another that only puts them in unions.

You need to register structure/unstructure hooks for the union you want to require/inject type tags for. You can use the cattrs strategy for handling unions to do this for you (see the docs) or you can then write some code in that, that reads the type tag and dispatches to the correct type to structure it as.

I will say that in practice writing hooks for every possible union is really annoying and not practical, so I developed a hook factory that automatically handles unions over certain types, e.g. attrs classes and any standard builtin type. For types that have specialized conversion (like dates <-> timestamps) you need some extra support for those.

I will say that it is quite a bit of work to write these factories.

dhirschfeld commented 3 months ago

I couldn't figure out any way to get cattrs to do this so just wrote my own recursive decode function:

def decode(arg: Any) -> Any:
    match arg:
        case {"_type": cls, "value": value}:
            return decoder[cls](decode(value))
        case arg if isinstance(arg, abc.Mapping):
            return {
                key: decode(value) if isinstance(value, abc.Mapping) else value
                for key, value
                in arg.items()
            }
        case list():
            return [
                decode(value) if isinstance(value, abc.Mapping) else value
                for value in arg
            ]
        case tuple():
            return tuple(
                decode(value) if isinstance(value, abc.Mapping) else value
                for value in arg
            )
        case arg if isinstance(arg, abc.Iterable) and not isinstance(arg, (str, bytes)):
            return (
                decode(value) if isinstance(value, abc.Mapping) else value
                for value in arg
            )
        case _:
            return arg
>>> decode(decoded)
{'seed': 42,
 'method': 'test',
 'effective_date': datetime.datetime(2024, 8, 20, 8, 51, 16, 707612),
 'mytuple': MyTuple(values=(1, 2, 3))}