Default field conversion conflicts with user-defined field converter function

agausmann commented 3 months ago

cattrs version: 23.2.0
Python version: 3.11.8
Operating System: Linux

Description

I want to structure a type with a float | None field where the unstructured data may be float, None, or string (either an unparsed float or the keyword "None" The default conversion provided by cattrs would handle the unparsed float, but it will not handle the keyword, so it is not helpful here. However, attrs has a converter option for field definitions which seems to be applicable.

When using the class constructor with a field converter set, the value gets passed to the converter function with no issue. However, when using cattrs, cattrs seems to still try the default conversion with the unstructured data, which causes an exception when it encounters the keyword.

I expected cattrs to not do this and just pass through the unstructured value in the case where there is a user-provided converter function.

What I Did

from typing import Any
import attrs, cattrs

def converter(x: str | float | None) -> float | None:
    if x is None or x == "None":
        return None
    return float(x)

@attrs.define
class Foo:
    a: float | None = attrs.field(converter=converter)

print(Foo(2.0))
# ... Foo(a=2.0)

print(Foo("None"))
# ... Foo(a=None)

print(cattrs.structure({"a": 2.0}, Foo))
# ... Foo(a=2.0)

print(cattrs.structure({"a": "None"}, Foo))
# ... ERROR

  + Exception Group Traceback (most recent call last):
  |   File "/home/goose/dev/data-analyzer-2/test.py", line 25, in <module>
  |     print(cattrs.structure({"a": "None"}, Foo))
  |           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |   File "/home/goose/.local/share/virtualenvs/data-analyzer-2-bqypBIyu/lib/python3.11/site-packages/cattrs/converters.py", line 332, in structure
  |     return self._structure_func.dispatch(cl)(obj, cl)
  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |   File "<cattrs generated structure __main__.Foo>", line 9, in structure_Foo
  | cattrs.errors.ClassValidationError: While structuring Foo (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "<cattrs generated structure __main__.Foo>", line 5, in structure_Foo
    |   File "/home/goose/.local/share/virtualenvs/data-analyzer-2-bqypBIyu/lib/python3.11/site-packages/cattrs/converters.py", line 627, in _structure_optional
    |     return self._structure_func.dispatch(other)(obj, other)
    |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/home/goose/.local/share/virtualenvs/data-analyzer-2-bqypBIyu/lib/python3.11/site-packages/cattrs/converters.py", line 430, in _structure_call
    |     return cl(obj)
    |            ^^^^^^^
    | ValueError: could not convert string to float: 'None'
    | Structuring class Foo @ attribute a
    +------------------------------------

Tinche commented 3 months ago

As you've noticed, cattrs will deal with the field before it reaches attrs and attrs will apply the converter. This can't be changed for a number of reasons like backwards-compatibility and the fact it interferes with other uses of attrs converters.

The immediate fix would be to make a converter with prefer_attrib_converters=True and use it.

c = cattrs.Converter(prefer_attrib_converters=True)

print(Foo(2.0))
print(Foo("None"))
print(c.structure({"a": 2.0}, Foo))
print(c.structure({"a": "None"}, Foo))

But I think a better approach would be to remove the attrs converter and have cattrs deal with this. That way the rest of your code base can be isolated from the details of data structuring, which is one of the points of cattrs.

Like this:

import attrs

import cattrs
from cattrs.gen import make_dict_structure_fn

def converter(x: str | float | None, _) -> float | None:
    if x is None or x == "None":
        return None
    return float(x)

@attrs.define
class Foo:
    a: float | None

c = cattrs.Converter()

c.register_structure_hook(
    Foo, make_dict_structure_fn(Foo, c, a=cattrs.override(struct_hook=converter))
)

print(c.structure({"a": 2.0}, Foo))
print(c.structure({"a": "None"}, Foo))

Let me know what you think! I'm going to close this in the meantime.

agausmann commented 3 months ago

Thank you very much for your insight!

I agree that it's better to isolate the nuances of the data format, and a custom Converter is the right tool for that. I was just coming at the problem from the wrong angle.

agausmann commented 3 months ago

Another solution using newtypes: you can register the hook just once for that newtype, instead of for every field in every class:

ParsedFloat = typing.NewType("ParsedFloat", float)

@attrs.define
class Foo:
    a: ParsedFloat | None = None

def converter(x: str | float | None, _) -> ParsedFloat | None:
    if x is None or x == "None":
        return None
    return ParsedFloat(float(x))

c = cattrs.Converter()
c.register_structure_hook(ParsedFloat, converter)

The signature of the converter is a little unorthodox, and this still leaks parsing information into the class definition. In this case I think it's a worthy tradeoff:

I am parsing a complex structure with a lot of classes and a lot of fields to clean up in this way. The newtype approach is more maintainable compared to listing the classes+fields in two different places (the class definition and converter creation)
The classes (Foo in this example) are specifically only used for ingesting data, it's not used by the main application. The output will be aggregated with data from other sources in a separate data structure that is used by the application.

python-attrs / cattrs

Default field conversion conflicts with user-defined field converter function #522

Description

What I Did