python-attrs / cattrs

Composable custom class converters for attrs, dataclasses and friends.
https://catt.rs
MIT License
791 stars 110 forks source link

NamedTuple to dictionary serialization #425

Closed karlicoss closed 2 months ago

karlicoss commented 11 months ago

Description

For my project, cachew, I've been trying to switch from using my custom serialization cold to some existing library. After some evaluation, cattrs seems like the most promising since it works so well with python typing annotations. One thing that stops me is that seems like it doesn't support NamedTuple? I feel like it should be treated just like a @dataclass, but seems like cattrs just treats them as 'unknown' types and leaves intact.

What I Did

This is kinda the minimal snippet that shows the issue:

from dataclasses import dataclass
from typing import NamedTuple

import cattrs

@dataclass
class D:
    value: int

class N(NamedTuple):
    value: int

und = cattrs.unstructure(D(value=123), D)
unn = cattrs.unstructure(N(value=123), N)

print(und, type(und))
print(unn, type(unn))

The result is

{'value': 123} <class 'dict'>
N(value=123) <class '__main__.N'>

So the dataclass is serialized just as expected, but the namedtuple is just left intact.

I searched in cattrs source code, and it seems like NamedTuple isn't mentioned at all -- so perhaps it was just forgotten rather than deliberately left out? I know I can implement a custom adapter, but feels like it makes sense to have them treated as dataclasses by default?

Tinche commented 11 months ago

Howdy,

I think it's a combination of nobody having asked for it and supporting libraries probably being able to handle named tuples natively.

Most of the time, output from cattrs is directly piped to another library, like orjson or msgpack, and those probably support named tuples already (since they are tuples)?

Although now that I think about it, the current approach is inadequate because it will not unstructure deeply. So yeah, agreed on it needing to be fixed. A little bit unsure on how to unstructure named tuples though, my first instinct would be to unstructure to lists (like ordinary tuples), not dicts.

karlicoss commented 11 months ago

Ah thanks, good to know! IMO they are more like dataclass rather than plain tuples since they are typed properly and have field names. But perhaps this could be configurable if people prefer 'legacy' behaviour for some reason.

gorkaerana commented 9 months ago

Hi,

I'd like to contribute to the implementation of this.

A "standard library" way to serialize a NamedTuple would be something like the below, which I adapted from [1] and [2]. (typing.NamedTuple is implemented as a function and so isinstance(a_named_tuple, typing.NamedTuple) throws a TypeError.)

def is_namedtuple(x):
    type_ = type(x)
    bases = type_.__bases__
    if len(bases) != 1 or bases[0] != tuple:
        return False
    fields = getattr(type_, '_fields', None)
    if not isinstance(fields, tuple):
        return False
    return all(isinstance(f, str) for f in fields)

def unstructure(obj):
    if isinstance(obj, dict):
        return {key: unstructure(value) for key, value in obj.items()}
    elif isinstance(obj, list):
        return list(map(unstructure, obj))
    elif is_namedtuple(obj):
        return {key: unstructure(value) for key, value in obj._asdict().items()}
    elif isinstance(obj, tuple):
        return tuple(map(unstructure, obj))
    else:
        return obj

I would suggest implementing the following way:

  1. Add is_namedtuple to src/_compat.py.
  2. Add gen_unstructure_namedtuple to cattrs.Converter.
  3. Add self.register_unstructure_hook_factory(is_namedtuple, self.gen_unstructure_namedtuple) to cattrs.Converter._init_.
  4. Add tests and documentation.

I would very happily put the work detailed in the list in a PR. Do let me know if you'd approach this any other way :)

References:

  1. https://stackoverflow.com/questions/2166818/how-to-check-if-an-object-is-an-instance-of-a-namedtuple
  2. https://stackoverflow.com/questions/33181170/how-to-convert-a-nested-namedtuple-to-a-dict?noredirect=1&lq=1
Tinche commented 9 months ago

Interesting!

A couple of points.

I think we will want to unstructure namedtuples into tuples for backwards compatibility. I see your code unstructures them into dictionaries. This is also a useful feature, but it should be a strategy instead. Then anyone can apply this strategy to any converter and get the behavior.

My first instinct would be to try to reuse the gen.make_dict_unstructure_fn functionality. It already supports attrs classes and dataclasses, and I think it can be adapted for NamedTuples easily. This would require a slight refactor, probably splitting part of that function into a private one that takes a list[Attribute]. Using this shares code and will be fast.

Then, write an adapter that can transform a NamedTuple class (not instance) into a list of attributes, like this.

Then, in the strategy, override the hooks to generate the proper unstructuring functions.

That's how I would go about it.

Happy to provide guidance if you decide to try!

gorkaerana commented 7 months ago

Sounds good! How about I give it a try in a PR and we take it from there?

Tinche commented 7 months ago

Go for it!

Tinche commented 7 months ago

Howdy, I've just merged https://github.com/python-attrs/cattrs/pull/491. This adds the namedtuples-as-tuples functionality to cattrs. I'll leave this issue open while other folks work on a strategy for namedtuples-as-dataclasses.

gnzsnz commented 2 months ago

this is fantastic!

do you plan a new release to include namedtuples?

Tinche commented 2 months ago

Yep, finishing it up now. The only thing left is to rework the readme

gnzsnz commented 1 month ago

Yep, finishing it up now. The only thing left is to rework the readme

I know that I'm a pain in the ... but I'm looking forward to have namedtuples in cattrs.

I'm happy to contribute with any changes needed, just let me know.