python-attrs / cattrs

Composable custom class converters for attrs, dataclasses and friends.
https://catt.rs
MIT License
779 stars 108 forks source link

Unstructuring generics #537

Closed AdrianSosic closed 1 month ago

AdrianSosic commented 1 month ago

Description

Hi @Tinche, not sure if this is a bug or intended behavior, but at least it surprised me and I'm not sure how to get around.

What I Did

Consider the following code

from typing import Generic, TypeVar

import cattrs
from attrs import define, field

converter = cattrs.Converter()

T = TypeVar("T")

@define
class GenericContainer(Generic[T]):
    instance: T = field()

@define
class UsesContainer:
    container: GenericContainer = field()
    # container: GenericContainer[int] = field()

def container_unstructure_hook(obj):
    fun = cattrs.gen.make_dict_unstructure_fn(GenericContainer, converter)
    return {"added_field": "abc", **fun(obj)}

converter.register_unstructure_hook(GenericContainer, container_unstructure_hook)
uses_component_factory = UsesContainer(GenericContainer(0))
print(converter.unstructure(uses_component_factory))

When you run the code from above, you'll get

{'container': {'added_field': 'abc', 'instance': 0}}

However, when you replace the field in UsesContainer with the commented line, you instead get

{'container': {'instance': 0}}

which indicates that the hook was not used.

Any thoughts?

Tinche commented 1 month ago

It's intended behavior, and I can explain why.

Converter.register_unstructure_hook uses a functools.singledispatch under the hood, and so the hooks registered using it follow singledispatch semantics. These loosely map to how the isinstance and issubclass built-ins work.

issubclass(GenericContainer[int], GenericContainer) doesn't return True (in fact it throws an exception, but cattrs can ignore this).

This means your hook won't trigger in the commented out case.

The solution is to switch to register_unstructure_hook_func. These are significantly more powerful and can handle anything using a predicate function.

Like this:

from typing import Any, Generic, TypeVar, get_origin

from attrs import define, field

import cattrs

converter = cattrs.Converter()

T = TypeVar("T")

@define
class GenericContainer(Generic[T]):
    instance: T = field()

@define
class UsesContainer:
    container: GenericContainer[int] = field()

def container_unstructure_hook(obj):
    fun = cattrs.gen.make_dict_unstructure_fn(GenericContainer, converter)
    return {"added_field": "abc", **fun(obj)}

def is_generic_container(type: Any) -> bool:
    return type is GenericContainer or get_origin(type) is GenericContainer

converter.register_unstructure_hook_func(
    is_generic_container, container_unstructure_hook
)
uses_component_factory = UsesContainer(GenericContainer(0))
print(converter.unstructure(uses_component_factory))

You might ask yourself why cattrs doesn't do this automatically, and the answer is I really dislike libraries that guess a lot on behalf of the user. I think in the long term the little added magic causes more bad surprises than good, and I prefer to err on the side of simplicity rather than ease.

You can go a step further and use a hook factory instead. This will be much faster. Like this:

from typing import Any, Generic, TypeVar, get_origin

from attrs import define, field

import cattrs

converter = cattrs.Converter()

T = TypeVar("T")

@define
class GenericContainer(Generic[T]):
    instance: T = field()

@define
class UsesContainer:
    container: GenericContainer[int] = field()

def container_unstructure_hook_factory(obj):
    fun = cattrs.gen.make_dict_unstructure_fn(GenericContainer, converter)

    def customized(obj):
        return {"added_field": "abc", **fun(obj)}

    return customized

def is_generic_container(type: Any) -> bool:
    return type is GenericContainer or get_origin(type) is GenericContainer

converter.register_unstructure_hook_factory(
    is_generic_container, container_unstructure_hook_factory
)
uses_component_factory = UsesContainer(GenericContainer(0))
print(converter.unstructure(uses_component_factory))

Let me know if you have any more question, but I'll close this for now to keep the number of issues manageable.

AdrianSosic commented 1 month ago

Hi @Tinche, perfect, thanks! Makes absolutely sense. I was aware of singledispatch, but simply did not anticipate that issubclass(GenericContainer[int], GenericContainer) would return False!

That part of the code now works without problems, but the real context is a bit more involved, as there is an additional base class involved. The problem that I now have is that the internal instance of MyAttrsClass is not being unstructured:

from typing import Any, Generic, TypeVar, get_origin

import cattrs
from attrs import define, field

converter = cattrs.Converter()

T = TypeVar("T")

@define
class MyAttrsClass:
    number: int = field()

@define
class BaseGenericContainer(Generic[T]):
    pass

@define
class GenericContainer(BaseGenericContainer[T]):
    instance: T = field()

@define
class UsesContainer:
    container: GenericContainer[MyAttrsClass] = field()

def container_unstructure_hook(obj):
    fun = cattrs.gen.make_dict_unstructure_fn(GenericContainer, converter)
    return {"added_field": "abc", **fun(obj)}

def is_generic_container(type: Any) -> bool:
    return type is GenericContainer or get_origin(type) is GenericContainer

converter.register_unstructure_hook_func(
    is_generic_container, container_unstructure_hook
)
uses_container = UsesContainer(GenericContainer(MyAttrsClass(0)))
print(converter.unstructure(uses_container))

This gives

{'container': {'added_field': 'abc', 'instance': MyAttrsClass(number=0)}}

I guess I somehow need to specify the specific type of GenericContainer during unstructuring?

AdrianSosic commented 1 month ago

Quick follow-up: I think I got it working by extracting the original hook first + changing the base classes of GenericContainer:

from typing import Any, Generic, TypeVar, get_origin

import cattrs
from attrs import define, field

converter = cattrs.Converter()

T = TypeVar("T")

@define
class MyAttrsClass:
    number: int = field()

@define
class BaseGenericContainer(Generic[T]):
    pass

@define
class GenericContainer(BaseGenericContainer, Generic[T]):
    instance: T = field()

@define
class UsesContainer:
    container: GenericContainer[MyAttrsClass] = field()

hook = converter.get_unstructure_hook(GenericContainer)

def container_unstructure_hook(obj):
    return {"added_field": "abc", **hook(obj)}

def is_generic_container(type: Any) -> bool:
    return type is GenericContainer or get_origin(type) is GenericContainer

converter.register_unstructure_hook_func(
    is_generic_container, container_unstructure_hook
)
uses_container = UsesContainer(GenericContainer(MyAttrsClass(0)))
print(converter.unstructure(uses_container))

@Tinche: in case you find the time, I'd still highly appreciate a quick "yes, this is it" to confirm this is the intended way. Also, to be honest, I still don't quite understand why I need to write

class GenericContainer(BaseGenericContainer, Generic[T])

instead of simply

class GenericContainer(BaseGenericContainer[T]):
Tinche commented 1 month ago

Referencing back to your example in https://github.com/python-attrs/cattrs/issues/537#issuecomment-2117214710,

the issue there is you're calling cattrs.gen.make_dict_unstructure_fn(GenericContainer, converter) in all cases. Since GenericContainer is a generic class, this is equivalent to GenericContainer[Any], and cattrs handles Any by just letting it through. A hook for GenericContainer[int] will be different from GenericContainer[MyAttrsClass], and we need to take that into account somehow.

The best way is to use hook factories. Hook factories are just one more level - instead of registering a hook with a converter, we register a function that receives a type and returns a hook. It also has the benefit of being much faster since make_dict_unstructure_fn is called once per class, not once per unstructure.

Here's your code with minimal modifications:

from typing import Any, Generic, TypeVar, get_origin

from attrs import define, field

import cattrs

converter = cattrs.Converter()

T = TypeVar("T")

@define
class MyAttrsClass:
    number: int = field()

@define
class BaseGenericContainer(Generic[T]):
    pass

@define
class GenericContainer(BaseGenericContainer[T]):
    instance: T = field()

@define
class UsesContainer:
    container: GenericContainer[MyAttrsClass] = field()

def container_unstructure_hook_fact(type):
    fun = cattrs.gen.make_dict_unstructure_fn(type, converter)

    def hook(obj) -> dict:
        return {"added_field": "abc", **fun(obj)}

    return hook

def is_generic_container(type: Any) -> bool:
    return type is GenericContainer or get_origin(type) is GenericContainer

converter.register_unstructure_hook_factory(
    is_generic_container, container_unstructure_hook_fact
)
uses_container = UsesContainer(GenericContainer(MyAttrsClass(0)))
print(converter.unstructure(uses_container))
AdrianSosic commented 1 month ago

Seems like hook factories are pretty much always the answer 😄 But yes, makes perfect sense! Thanks so much for diggin' through the code example!! 👏🏼