python-attrs / cattrs

Composable custom class converters for attrs, dataclasses and friends.
https://catt.rs
MIT License
779 stars 108 forks source link

Calling `include_subclasses()` prevents later structure hooks from working #546

Closed scottee closed 1 month ago

scottee commented 1 month ago

Description

I have a set of classes that all inherit from one base class. Each sub-class has a unique required attribute to differentiate it, so I call include_subclasses() to handle this. My classes also have several union attributes like Union[str, pathlib.Path] defined. (There are more unions, but this is the one I got the exception on.) For each of these unions, I have a register_structure_hook() call to manage it. Everything was working fine, until I added the Union[str, Path] attr. After that I got an exception like this:

        +---------------- 2 ----------------
        | Traceback (most recent call last):
        |   File "<cattrs generated structure yadbee.config_transform.ConfigTransform>", line 130, in structure_MyMainClass
        |     res['runtime_persist'] = __c_structure_runtime_persist(o['runtime_persist'], __c_type_runtime_persist)
        |   File "/Users/escott/Library/Caches/.../lib/python3.10/site-packages/cattrs/fns.py", line 17, in raise_error
        |     raise StructureHandlerNotFoundError(msg, type_=cl)
        | cattrs.errors.StructureHandlerNotFoundError: Unsupported type: typing.Union[str, pathlib.Path]. Register a structure hook for it.
        | Structuring class MyMainClass @ attribute my_path_attr
        +------------------------------------

The trouble is that I already had a structure hook registered for Union[str, Path], but it wasn't being utilized in my call to my_converter.structure(json_obj, MyMainClass).

What I Did

The converter initialization originally looked like this when I was getting the exception:

        converter = cattrs.Converter(forbid_extra_keys=True)
        include_subclasses(BaseClass, converter)
        # cattrs doesn't know how to handle various unions
        converter.register_structure_hook(Union[List[str], str], lambda x, _: [x] if isinstance(x, str) else x)
       ... More hooks
        converter.register_structure_hook(Union[str, Path],  lambda x, _: x )

The work-around to this was to reverse the order of registering union hooks and include_subclasses:

        converter = cattrs.Converter(forbid_extra_keys=True)
        # cattrs doesn't know how to handle various unions
        converter.register_structure_hook(Union[List[str], str], lambda x, _: [x] if isinstance(x, str) else x)
       ... More hooks
        converter.register_structure_hook(Union[str, Path],  lambda x, _: x )
       # Work around: Move the include_classes after all the union hooks.
        include_subclasses(BaseClass, converter)

Is it expected that there are ordering dependencies between these calls?

Tinche commented 1 month ago

So, the short answer to the question whether the order of hook registration is significant is yes. Simpler hooks should be registered before more complex hooks (which might need the simpler hooks).

The long answer is that it depends on the hooks themselves, and the default hook factories are written to assume it.

In order to handle (structure/unstructure) a piece of data, the cattrs converter needs to do two things: find the function associated with the type of that data (the hook), and call that function. Even though we aggressively cache hooks (using functools.lru_cache), finding the hook takes a meaningful amount of time.

So in this particular case, the internal hook factory that produces the structuring hooks for attrs classes is written in a way that it only retrieves the hooks for each attribute once, when a class is first seen. This makes cattrs really fast but can be a little surprising if you don't know what's going on. Note that strategies like include_subclasses are even more complex, and they call various hook factories themselves at time of application.

(It's also worth mentioning that recursive classes get special treatment in order to work, and there's logic in cattrs that tries to bust various caches when new structure hooks are added to minimize problems, but it isn't perfect.)

In your particular case, I'm going to guess include_subclasses generated the hooks for subclasses of BaseClass before the list[str]|str hook was registered, and the generated hooks baked in the wrong list[str]|str hook.

Hope that answers your question. I'm going to close this now to try to keep the issue list tidy, but feel free to follow up with questions if you have more.