Closed aljungberg closed 8 months ago
Hello!
Thanks for the writeup, always glad to hear from cattrs users, especially folks just coming in.
To provide a little context, the unstructure_hook
param to override was added relatively recently, as a low-level escape hatch. Like you surmised yourself. I was/am planning to use it to implement an additional validation system on top of cattrs, and for anyone else to be able to do similar things by themselves. I think it does belong in override, although maybe with more documentation and a warning?
I do agree cattrs might have a functionality gap. Lately I've been reframing what exactly I think cattrs is, and the definition I'm landing at is that it's a data transformation framework based on function composition. But... cattrs doesn't make it very easy to compose functions. Or at least it could make it significantly easier. Some of this should be done through new convenience APIs, and some through documentation.
(I'm generally very careful about adding convenience APIs, especially when you can just use direct Python instead. I think they clog up the mental model of the API and can be an attractive nuisance.)
I think the most basic and easiest form of function composition in Python is... Just wrapping functions. I think that may be part of your intuition gap; I guess you were expecting the unstruct_hook
to wrap instead of replace, and understandably so.
As a thought exercise, I tried solving your example as a cattrs expert to see how I'd go about it.
My first though was leveraging the fact that cattrs.gen.make_dict_unstructure_fn()
return values are just callables with the right signature to be hooks. So for example, if you only wanted to unstructure instances of Person
, you can just store the result of make_dict_unstructure_fn
and throw the converter away.
Continuing your scenario:
unstruct = make_dict_unstructure_fn(
Person, cattrs.Converter(), _children_by_name=override(rename="children")
)
print(unstruct(alice)) # {'name': 'Alice', 'children': {'Bob': {'name': 'Bob', '_children_by_name': {}}}}
But interestingly, that's actually wrong. You have a recursive class and if the hook doesn't actually get registered on the converter the nested data is incorrect. So in this case it's more of a foot gun than a cool convenience.
We actually need to do the entire registration dance. But how do we wrap it?
default_hook = make_dict_unstructure_fn(
Person, c, _children_by_name=override(rename="children")
)
def my_hook(val: Person) -> dict:
res = default_hook(val)
res["children"] = list(res["children"].values())
return res
c.register_unstructure_hook(Person, my_hook)
print(c.unstructure(alice))
This'll get us what we want but it's borderline too complex.
Brainstorming a little bit, here's what we could do.
Converters
, get_structure_hook(type)
and get_unstructure_hook(type)
. These would just return the current hooks for a type. This is already possible using internal APIs (converter._structure_func.dispatch()
), but they should be public to serve as building blocks for composition.Then the example could be a little more general (merging the rename into the wrapper):
default_hook = c.get_unstructure_hook(Person)
def my_hook(val: Person) -> dict:
res = default_hook(val)
res["children"] = list(res["_children_by_name"].values())
return res
c.register_unstructure_hook(Person, my_hook)
Maybe add a wrap_with
argument to override
. That would be your wrapping functionality.
Maybe add a convenience wrap_un/structure_hook_with()
to converters.
Unsure what the wrapping API would be like yet. Maybe it's a factory function that takes a callable and returns a callable?
a data transformation framework based on function composition
Ah, I appreciate the context! That makes total sense to me and evokes Haskell's typeclasses and Rust impl
. Framing cattrs
's converter
as akin to a pair of typeclasses, Unstructurable
and Structurable
, it's like we're instance
'ing a type with an implementation. Love that perspective shift you gave me, that really clarifies cattrs conceptually for me.
Drawing on this notion, your get_structure_hook
API fits perfectly by making the vanilla function available for composition. Using a hypothetical Haskell dot operator for function composition, you could construct a hook with new_hook = rename_fn . default_hook . list_values_fn
and voila, there's the new customised hook, bookended by two very simple custom functions with minimal boilerplate.
This method does treat the Person
hook as a unit, while I wanted to surgically target _children_by_name
, taking advantage of make_dict_unstructure_fn
s per attribute control. However, after tinkering with your proposed solution, I think make_dict_unstructure_fn
becomes overkill for my use-case.
Here's a slightly different example to justify needing both pre- and post-processing (whereas you reasonably fused the list and rename operations). This code already works with the current cattrs! (But hits an internal API.)
# To really drive home the function composition parallel, let's make a compose function.
def compose(*fns):
"""Compose functions like Haskell's `.`."""
def composed_fn(val):
for fn in reversed(fns):
val = fn(val)
return val
return composed_fn
def pre_fn(val: Person) -> Person:
# Sometimes we need to both preprocess and postprocess values for maximum control. In this case,
# we need to filter out any possessed children *before* turning to the default implementation,
# or the default hook will get stuck to the ceiling muttering in tongues.
return attrs.evolve(val,
children_by_name={k: v for k, v in val._children_by_name.items() if "Possessed" not in v.name})
# In the post_fn, we do the rename and the listification as previously. We can't do it in the
# preprocessing function since we haven't yet dictified there.
def post_fn(res: dict) -> dict:
res["children"] = list(res.pop("_children_by_name").values())
return res
c = cattrs.Converter()
default_hook = c._unstructure_func.dispatch(Person)
c.register_unstructure_hook(Person, compose(post_fn, default_hook, pre_fn))
demonic_bob = Person(name="Evil Possessed Bob")
alice.add_child(demonic_bob)
assert repr(c.unstructure(alice)) == "{'name': 'Alice', 'children': [{'name': 'Bob', 'children': []}]}"
Not to get too carried away, but one could even envision including compose
and a suite of built-in function factories in cattrs. Then make_dict_unstructure_fn
would only ever be needed for the potential performance gains. For low stakes use, pure function composition would be more intuitive.
# Before, to rename a field:
renamer_hook = make_dict_unstructure_fn(
Person, c, _children_by_name=override(rename="children")
)
c.register_unstructure_hook(Person, renamer_hook)
# After:
from cattrs.functional import rename, compose
c.register_unstructure_hook(Person, compose(rename(_children_by_name="children"), c.builtin))
c.builtin
here is syntactic sugar, a sentinel value which would be replaced by a context dependent appropriate default, c.get_unstructure_hook(Person)
in this case. It makes it easier for new users to avoid the foot gun c.unstructure
, which if wrapped would end up dispatching not to to the original hook but to the new hook and cause infinite recursion.
Then we could write my exorcism example like so:
c.register_unstructure_hook(Person, compose(
lambda res: {**res, 'children': list(res.pop('_children_by_name').values())},
c.builtin,
lambda val: attrs.evolve(val, children_by_name={k: v for k, v in val._children_by_name.items() if "Possessed" not in v.name})
))
Or perhaps for clarity and convenience there could be a function specifically for doing this where the composition is implied. So finishing the thought you started, the convenience function could in fact be called wrap
to be more Pythonic:
c.wrap_unstructure_for(Person,
lambda res: {**res, 'children': list(res.pop('_children_by_name').values())},
c.previous,
lambda val: attrs.evolve(val, children_by_name={k: v for k, v in val._children_by_name.items() if "Possessed" not in v.name}),
)
Renamed builtin
to previous
here to make it clear that if you do this repeatedly you'll be layering more transformations on top of whatever's already there. Note that although I'm demoing 3 arguments here, you could have any number of functions before the previous
and any number (including 0) after it. This will make it easier to pull in stuff from the built-in library. So we could then do, imagining field_transformer
, field_renamer
and attrs_transformer
function factories:
c.wrap_unstructure_for(Person,
field_transformer(children=lambda d: list(d.values())),
field_renamer(_children_by_name='children'),
c.previous,
attrs_transformer(children_by_name=lambda val, a: {k: v for k, v in a.items() if "Possessed" not in v.name}),
)
This approach would make it extremely easy to assemble complex transformation pipelines using simple, well-understood functions. I find it quite expressive and, at least in my eyes, intuitive. It's well-aligned with the concept of function composition for data transformation. And there's no magic; each function we include is straightforward and easily understandable.
I've written and documented converter.get_structure_hook
and converter.get_unstructure_hook
, and will revamp the docs once more before the next release to lean on this aspect a little more.
I'm going to leave it here for now, until I see how things are progressing. I feel Python already has powerful ways of composing functions so we don't need to provide wrap
right away.
Description
The new
unstruct_hook
override feature in cattrs provides a way to customise unstructuring per attribute but it appears a blunt instrument: you now seem to be responsible for the totality of unstructuring from that attribute down.What I Did
Discussion: An Intuition Gap?
I'm a new user and might just be doing things sideways. All of the following might be wrong, just my quick thoughts.
It seems
unstruct_hook
is intended to be the ultimate escape hatch that enables wholesale replacement of the unstructuring behaviour for specific attributes. But I do wonder if this is usually what one wants withoverride
. If I wanted to just write my ownasdict
method forPerson
, I could already just write a function and use it directly withregister_unstructure_hook
. The reason I'm usingcattrs.gen
is to tweak the built-in behaviour rather than reimplementing it from scratch.There's also an intuition gap here. Specifically, it feels counterintuitive that customising
Person
unstructuring causes a non-unstructuredPerson
to pop out. I recognise this is in truth a misunderstanding: I think I'm customising but I'm actually replacing. Still, I'm probably not the last person who'll make the same mistake.To me it would make sense to do one of the following:
unstructure_hook
rather than replacing itObviously you could argue that just calling
c.unstructure
recursively like I do in the final example is fine and I'm just holding it wrong.