jcrist / msgspec

A fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML
https://jcristharif.com/msgspec/
BSD 3-Clause "New" or "Revised" License
2.44k stars 75 forks source link

Unexpected error in `convert` for unset values #758

Open tijmenr opened 1 month ago

tijmenr commented 1 month ago

Description

The convert function does not handle UNSET values as I expected (or I am missing some detail in how optionality and unset work together). If a field has a union type that allows msgspec.UnsetType, it seems to ignore that during the conversion:

>>> import msgspec
>>> msgspec.__version__
'0.18.6'

>>> class A(msgspec.Struct,):
...     x: str
...     y: str | msgspec.UnsetType = msgspec.field(default=msgspec.UNSET)
...
>>> class B(msgspec.Struct):    # Basically same as A, but might differ in encoded field names
...     x: str
...     y: str | msgspec.UnsetType = msgspec.field(default=msgspec.UNSET)
...
>>> a = A(x='x')   # a=A(x='x', y=UNSET)

The value a fits the struct B (as it allows an unset y field), so I would expect conversion of a to an instance of B to succeed, with the value of the y field set to either UNSET (as it is in a) or the defined default in B (which in this case happens to be UNSET as well; this would need documentation). However, it fails on this unset value of the y field`:

>>> msgspec.convert(a, B, from_attributes=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
msgspec.ValidationError: Expected `str`, got `msgspec.UnsetType` - at `$.y`

(Trying to "convert" a into a new instance of struct A does not yield an error, probably because there is some shortcut logic at play).

The same error also occurs when trying to convert a dict with an explicit unset value:

>>> msgspec.convert({'x': 'x', 'y': msgspec.UNSET}, B, from_attributes=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
msgspec.ValidationError: Expected `str`, got `msgspec.UnsetType` - at `$.y`

I currently have a use case where I basically need to translate the field names of a multi-level json structure from kebab-case to PascalCase, and my idea was to create two roughly the same structs K and P (using the handy rename option so that each gets the desired encoded field names), decode the source document using struct K, convert it into struct P, and use that last one for output. However, this issue with unset fields hinders that approach. (I'm going to try and change the default value from UNSET to None as a workaround, but a side effect is not being able to distinguish between a field just not being present, or being present with a null value in the source document.)

On a side note (not sure if that would be a separate bug or a feature), having a dict as an intermediate step between K and P does not help, because msgspec.structs.asdict does not honor the omit_defaults option of the struct, so it does include the unset fields in the dict, and this means convert can not work on the dict either (unless I first rebuild the dict myself to remove all unset fields).

tijmenr commented 1 month ago

Another unexpected behavior for convert:

Given:

>>> class A(msgspec.Struct, rename='pascal'): a_x: int
...    
>>> class B(msgspec.Struct, rename='kebab'): a_x: int
... 
>>> class C(msgspec.Struct): AX: int
... 

The following conversions work, making it seem as though convert is flexible on whether it uses the "real" field name or the "encoded" one:

>>> msgspec.convert(A(1), B, from_attributes=True)    # Apparently uses a_x from A
B(a_x=1)
>>> msgspec.convert(B(1), A, from_attributes=True)    # Apparently uses a_x from B
A(a_x=1)
>>> msgspec.convert(C(1), A, from_attributes=True)    # Apparently uses AX from C
A(a_x=1)

But the flexibility on "real" or "encoded" is apparently only on the destination type, because the following fails:

>>> msgspec.convert(A(1), C, from_attributes=True)    # Cannot use AX from A
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
msgspec.ValidationError: Object missing required field `AX`

And then suddenly when converting not from a Struct, but from a dict object, it only works if the dict contains the "encoded" field names:

>>> msgspec.convert({'AX':1}, A)    # Using encoded field name works
A(a_x=1)

>>> msgspec.convert({'a_x':1}, A)    # Using real field name fails
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
msgspec.ValidationError: Object missing required field `AX`

>>> # So you cannot do a roundtrip...
>>> msgspec.convert(msgspec.structs.asdict(A(1)), A)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
msgspec.ValidationError: Object missing required field `AX`

Again, this behavior makes it difficult to use msgspec to translate similar json structures between different field name conventions.