python-attrs / cattrs

Composable custom class converters for attrs, dataclasses and friends.
https://catt.rs
MIT License
791 stars 110 forks source link

Should unstruct_hook not unstruct? #432

Closed aljungberg closed 8 months ago

aljungberg commented 11 months ago

Description

The new unstruct_hook override feature in cattrs provides a way to customise unstructuring per attribute but it appears a blunt instrument: you now seem to be responsible for the totality of unstructuring from that attribute down.

What I Did

import attrs, cattrs
from cattrs.gen import make_dict_unstructure_fn, override

# Let's say we have a class Person which, as an implementation detail, stores its children in a dict, 
# so we can retrieve them by name.

@attrs.define
class Person:
    name: str = attrs.field()
    _children_by_name: dict[str, "Person"] = attrs.field(factory=dict)

    def add_child(self, child: "Person"):
        self._children_by_name[child.name] = child

attrs.resolve_types(Person)

alice = Person(name="Alice")
alice.add_child(Person(name="Bob"))

# Let's make our external representation a little more presentable by renaming _children_by_name to children.
c = cattrs.Converter()
c.register_unstructure_hook(Person, make_dict_unstructure_fn(Person, c, 
    _children_by_name=override(rename="children")))
# Note that just because we use an `override` with a `rename` we don't lose recursive unstructuring.
assert repr(c.unstructure(alice)) == "{'name': 'Alice', 'children': {'Bob': {'name': 'Bob', 'children': {}}}}"

# Let's use a list instead of a dict, since the name is redundant. A case for an attribute unstruct hook?
c = cattrs.Converter()
c.register_unstructure_hook(Person, make_dict_unstructure_fn(Person, c, 
    _children_by_name=override(rename="children", unstruct_hook=lambda x: list(x.values()))))
# Oh, but this is unexpected. Now Bob is not unstructured anymore.
assert repr(c.unstructure(alice)) == "{'name': 'Alice', 'children': [Person(name='Bob', _children_by_name={})]}"

# This works: calling back into the converter explicitly. But it seems surprising.
c = cattrs.Converter()
c.register_unstructure_hook(Person, make_dict_unstructure_fn(Person, c, 
    _children_by_name=override(rename="children", unstruct_hook=lambda x: c.unstructure(list(x.values())))))
assert repr(c.unstructure(alice)) == "{'name': 'Alice', 'children': [{'name': 'Bob', 'children': []}]}"

Discussion: An Intuition Gap?

I'm a new user and might just be doing things sideways. All of the following might be wrong, just my quick thoughts.

It seems unstruct_hook is intended to be the ultimate escape hatch that enables wholesale replacement of the unstructuring behaviour for specific attributes. But I do wonder if this is usually what one wants with override. If I wanted to just write my own asdict method for Person, I could already just write a function and use it directly with register_unstructure_hook. The reason I'm using cattrs.gen is to tweak the built-in behaviour rather than reimplementing it from scratch.

There's also an intuition gap here. Specifically, it feels counterintuitive that customising Person unstructuring causes a non-unstructured Person to pop out. I recognise this is in truth a misunderstanding: I think I'm customising but I'm actually replacing. Still, I'm probably not the last person who'll make the same mistake.

To me it would make sense to do one of the following:

Obviously you could argue that just calling c.unstructure recursively like I do in the final example is fine and I'm just holding it wrong.

Tinche commented 11 months ago

Hello!

Thanks for the writeup, always glad to hear from cattrs users, especially folks just coming in.

To provide a little context, the unstructure_hook param to override was added relatively recently, as a low-level escape hatch. Like you surmised yourself. I was/am planning to use it to implement an additional validation system on top of cattrs, and for anyone else to be able to do similar things by themselves. I think it does belong in override, although maybe with more documentation and a warning?

I do agree cattrs might have a functionality gap. Lately I've been reframing what exactly I think cattrs is, and the definition I'm landing at is that it's a data transformation framework based on function composition. But... cattrs doesn't make it very easy to compose functions. Or at least it could make it significantly easier. Some of this should be done through new convenience APIs, and some through documentation.

(I'm generally very careful about adding convenience APIs, especially when you can just use direct Python instead. I think they clog up the mental model of the API and can be an attractive nuisance.)

I think the most basic and easiest form of function composition in Python is... Just wrapping functions. I think that may be part of your intuition gap; I guess you were expecting the unstruct_hook to wrap instead of replace, and understandably so.

As a thought exercise, I tried solving your example as a cattrs expert to see how I'd go about it.

My first though was leveraging the fact that cattrs.gen.make_dict_unstructure_fn() return values are just callables with the right signature to be hooks. So for example, if you only wanted to unstructure instances of Person, you can just store the result of make_dict_unstructure_fn and throw the converter away.

Continuing your scenario:

unstruct = make_dict_unstructure_fn(
    Person, cattrs.Converter(), _children_by_name=override(rename="children")
)

print(unstruct(alice)) # {'name': 'Alice', 'children': {'Bob': {'name': 'Bob', '_children_by_name': {}}}}

But interestingly, that's actually wrong. You have a recursive class and if the hook doesn't actually get registered on the converter the nested data is incorrect. So in this case it's more of a foot gun than a cool convenience.

We actually need to do the entire registration dance. But how do we wrap it?

default_hook = make_dict_unstructure_fn(
    Person, c, _children_by_name=override(rename="children")
)

def my_hook(val: Person) -> dict:
    res = default_hook(val)
    res["children"] = list(res["children"].values())
    return res

c.register_unstructure_hook(Person, my_hook)

print(c.unstructure(alice))

This'll get us what we want but it's borderline too complex.

Brainstorming a little bit, here's what we could do.

  1. Add two methods to Converters, get_structure_hook(type) and get_unstructure_hook(type). These would just return the current hooks for a type. This is already possible using internal APIs (converter._structure_func.dispatch()), but they should be public to serve as building blocks for composition.

Then the example could be a little more general (merging the rename into the wrapper):

default_hook = c.get_unstructure_hook(Person)

def my_hook(val: Person) -> dict:
    res = default_hook(val)
    res["children"] = list(res["_children_by_name"].values())
    return res

c.register_unstructure_hook(Person, my_hook)
  1. Maybe add a wrap_with argument to override. That would be your wrapping functionality.

  2. Maybe add a convenience wrap_un/structure_hook_with() to converters.

Unsure what the wrapping API would be like yet. Maybe it's a factory function that takes a callable and returns a callable?

aljungberg commented 11 months ago

a data transformation framework based on function composition

Ah, I appreciate the context! That makes total sense to me and evokes Haskell's typeclasses and Rust impl. Framing cattrs's converter as akin to a pair of typeclasses, Unstructurable and Structurable, it's like we're instance'ing a type with an implementation. Love that perspective shift you gave me, that really clarifies cattrs conceptually for me.

Drawing on this notion, your get_structure_hook API fits perfectly by making the vanilla function available for composition. Using a hypothetical Haskell dot operator for function composition, you could construct a hook with new_hook = rename_fn . default_hook . list_values_fn and voila, there's the new customised hook, bookended by two very simple custom functions with minimal boilerplate.

This method does treat the Person hook as a unit, while I wanted to surgically target _children_by_name, taking advantage of make_dict_unstructure_fns per attribute control. However, after tinkering with your proposed solution, I think make_dict_unstructure_fn becomes overkill for my use-case.

Here's a slightly different example to justify needing both pre- and post-processing (whereas you reasonably fused the list and rename operations). This code already works with the current cattrs! (But hits an internal API.)

# To really drive home the function composition parallel, let's make a compose function.
def compose(*fns):
    """Compose functions like Haskell's `.`."""
    def composed_fn(val):
        for fn in reversed(fns):
            val = fn(val)
        return val
    return composed_fn

def pre_fn(val: Person) -> Person:
    # Sometimes we need to both preprocess and postprocess values for maximum control. In this case,
    # we need to filter out any possessed children *before* turning to the default implementation,
    # or the default hook will get stuck to the ceiling muttering in tongues.
    return attrs.evolve(val, 
        children_by_name={k: v for k, v in val._children_by_name.items() if "Possessed" not in v.name})

# In the post_fn, we do the rename and the listification as previously. We can't do it in the
# preprocessing function since we haven't yet dictified there.
def post_fn(res: dict) -> dict:
    res["children"] = list(res.pop("_children_by_name").values())
    return res

c = cattrs.Converter()
default_hook = c._unstructure_func.dispatch(Person)
c.register_unstructure_hook(Person, compose(post_fn, default_hook, pre_fn))

demonic_bob = Person(name="Evil Possessed Bob")
alice.add_child(demonic_bob)

assert repr(c.unstructure(alice)) == "{'name': 'Alice', 'children': [{'name': 'Bob', 'children': []}]}"

Not to get too carried away, but one could even envision including compose and a suite of built-in function factories in cattrs. Then make_dict_unstructure_fn would only ever be needed for the potential performance gains. For low stakes use, pure function composition would be more intuitive.

# Before, to rename a field:
renamer_hook = make_dict_unstructure_fn(
    Person, c, _children_by_name=override(rename="children")
)
c.register_unstructure_hook(Person, renamer_hook)

# After:
from cattrs.functional import rename, compose
c.register_unstructure_hook(Person, compose(rename(_children_by_name="children"), c.builtin))

c.builtin here is syntactic sugar, a sentinel value which would be replaced by a context dependent appropriate default, c.get_unstructure_hook(Person) in this case. It makes it easier for new users to avoid the foot gun c.unstructure, which if wrapped would end up dispatching not to to the original hook but to the new hook and cause infinite recursion.

Then we could write my exorcism example like so:

c.register_unstructure_hook(Person, compose(
    lambda res: {**res, 'children': list(res.pop('_children_by_name').values())},
    c.builtin,
    lambda val: attrs.evolve(val, children_by_name={k: v for k, v in val._children_by_name.items() if "Possessed" not in v.name})
))

Or perhaps for clarity and convenience there could be a function specifically for doing this where the composition is implied. So finishing the thought you started, the convenience function could in fact be called wrap to be more Pythonic:

c.wrap_unstructure_for(Person,
    lambda res: {**res, 'children': list(res.pop('_children_by_name').values())},
    c.previous,
    lambda val: attrs.evolve(val, children_by_name={k: v for k, v in val._children_by_name.items() if "Possessed" not in v.name}),
)

Renamed builtin to previous here to make it clear that if you do this repeatedly you'll be layering more transformations on top of whatever's already there. Note that although I'm demoing 3 arguments here, you could have any number of functions before the previous and any number (including 0) after it. This will make it easier to pull in stuff from the built-in library. So we could then do, imagining field_transformer, field_renamer and attrs_transformer function factories:

c.wrap_unstructure_for(Person,
    field_transformer(children=lambda d: list(d.values())),
    field_renamer(_children_by_name='children'),
    c.previous,
    attrs_transformer(children_by_name=lambda val, a: {k: v for k, v in a.items() if "Possessed" not in v.name}),
)

This approach would make it extremely easy to assemble complex transformation pipelines using simple, well-understood functions. I find it quite expressive and, at least in my eyes, intuitive. It's well-aligned with the concept of function composition for data transformation. And there's no magic; each function we include is straightforward and easily understandable.

Tinche commented 8 months ago

I've written and documented converter.get_structure_hook and converter.get_unstructure_hook, and will revamp the docs once more before the next release to lean on this aspect a little more.

I'm going to leave it here for now, until I see how things are progressing. I feel Python already has powerful ways of composing functions so we don't need to provide wrap right away.