python-attrs / cattrs

Composable custom class converters for attrs, dataclasses and friends.
https://catt.rs
MIT License
779 stars 108 forks source link

How to hook into structuring of a simple dict? #524

Open kkg-else42 opened 3 months ago

kkg-else42 commented 3 months ago

Hey there!

I have a few attrs classes. Some are members of a (tagged) union, together with dict. The unstructure already works. Among other things, datetime objects are converted into strings -- in a special format (_TIMESTAMP_FORMAT: Final[str] = '%Y%m%d_%H%M%S').

The structuring of datetime attributes in attrs instances works perfectly. My problem is with the structuring of simple dict objects. I need to hook into it to convert the strings (which are in that special format) into datetime objects. But since the simple dict objects can also contain attrs instances, it must be possible to call the structuring recursively again.

I just need a hint how to call the structuring without creating an endless loop.

Tinche commented 3 months ago

Could you provide a minimal example in code?

kkg-else42 commented 3 months ago

Sorry for the delay... I don't know if it is a minimal one, but here is my example:

from datetime import datetime
from typing import Any, Final, Type, TypeVar

import attrs
from cattrs.preconf.json import make_converter
from cattrs.strategies import configure_tagged_union

T = TypeVar('T')

@attrs.frozen
class Sub:
    foo: str = 'bar'
    # and some more fields (incl. other attrs types)

@attrs.frozen
class A:
    some: str = 'a'
    sub: Sub = Sub()

@attrs.frozen
class B:
    some: str = 'b'
    sub: Sub = Sub()

FrameData = dict | A | B

@attrs.frozen
class Frame:
    data: FrameData

_CUSTOMIZED_STRUCTURE_TYPES: Final[set] = {
    datetime,
    dict,
    Frame,
    # and some more...
}

_TIMESTAMP_FORMAT: Final[str] = '%Y%m%d_%H%M%S'

def _structure(data: dict[str, Any] | str, to_type: Type[T]) -> T:
    match to_type:
        case t if t is datetime:
            return datetime.strptime(data, _TIMESTAMP_FORMAT)
        case t if t is dict:
            return _structure_dict(data)
        case t if t is Frame:
            data.pop('to_add', None)
            return conv.structure_attrs_fromdict(data, Frame)
        case _:
            raise NotImplementedError(f'Unsupported type: {str(to_type)}.')

def _structure_dict(data: dict[str, Any]) -> dict[str, Any]:
    structured: dict[str, Any] = data.copy()
    for k, v in structured.items():
        if isinstance(v, str):
            try:
                structured[k] = datetime.strptime(v, _TIMESTAMP_FORMAT)
            except ValueError:
                continue
    # something is needed here to call the converter for structuring the other values of the dict
    return structured

conv = make_converter()

for data_type in _CUSTOMIZED_STRUCTURE_TYPES:
    conv.register_structure_hook(data_type, lambda data, to_type: _structure(data, to_type))

configure_tagged_union(union=FrameData,
                       converter=conv,
                       tag_name='_type',
                       tag_generator=lambda t: t.__name__.casefold(),
                       default=dict)

As a result of this:

f='{"data": {"a": {"some": "a", "sub": {"foo": "bar"}}, "ts": "20240320_010203", "_type": "dict"}}'
print(conv.loads(f, Frame))

I get this output: Frame(data={'a': {'some': 'a', 'sub': {'foo': 'bar'}}, 'ts': datetime.datetime(2024, 3, 20, 1, 2, 3)})

But what I need is this output: Frame(data={'a': A(some='a', sub=Sub(foo='bar')), 'ts': datetime.datetime(2024, 3, 20, 1, 2, 3)})

kkg-else42 commented 3 months ago

Hi Tin, Is there anything else I should add or is it just a busy schedule?

Tinche commented 2 months ago

Hey,

yeah sorry I got sidetracked by other things.

But since the simple dict objects can also contain attrs instances, it must be possible to call the structuring recursively again.

This is going to be complicated without modeling this more precisely. How do you know a nested dict is supposed to be converter into a class instance and not left as a dict? If a nested dict always means A | B, then it gets easier.

Frame(data={'a': {'some': 'a', 'sub': {'foo': 'bar'}}, 'ts': datetime.datetime(2024, 3, 20, 1, 2, 3)})

That looks correct given the input. Even if we assume data['a'] is logically typed as FrameData, it has no _type field and so will default to a dict. In other words, how can we tell data['a'] is supposed to be A?

kkg-else42 commented 2 months ago

But since the simple dict objects can also contain attrs instances, it must be possible to call the structuring recursively again.

This is going to be complicated without modeling this more precisely. How do you know a nested dict is supposed to be converter into a class instance and not left as a dict? If a nested dict always means A | B, then it gets easier.

I now see that my example is misleading. My sentence (which you quoted) referred to the following line in the _structure_dict function:

# something is needed here to call the converter for structuring the other values of the dict
return structured

The goal is not to convert an arbitrary dict into an attrs instance. But dict-values, which in turn can be attrs instances (with datetime), should also be converted accordingly. However, they are not recognized as such before the datetime conversion (due to the special format). To achieve this, the converter would have to be called again within the _structure_dict function. Something like this:

# something is needed here to call the converter for structuring the other values of the dict
return conv.structure(structured, dict)

Of course, this doesn't work because it creates an endless loop. With attrs classes I use conv.structure_attrs_fromdict to achieve this (as in the case of Frame).

If necessary, I can rework the example. (But that will certainly not be until the week after next.)

Tinche commented 2 months ago

Yeah, a simplified example would be good.

To achieve this, the converter would have to be called again within the _structure_dict function. Something like this: ... Of course, this doesn't work because it creates an endless loop.

You can just call _structure_dict on each value yourself, right? You don't even need to jump back into cattrs. It won't create an endless loop since it will stop when there are no values.