Cannot convert with `from_attributes` when using a rename convention

LonelyVikingMichael commented 5 months ago

Description

Some example code using a Struct with rename="camel":

from dataclasses import dataclass

from msgspec import Struct, convert

@dataclass
class ThingModel:
    thing_id: str

class Thing(Struct, rename="camel"):
    thing_id: str

tm = ThingModel(thing_id="123")
t = convert(tm, Thing, from_attributes=True)

Raises:

Traceback (most recent call last):
  File "/app/backend/smalltest.py", line 16, in <module>
    t = convert(tm, Thing, from_attributes=True)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
msgspec.ValidationError: Object missing required field `thingId`

Is there a way around this? For this specific example, we can leverage unpacking the dataclass' asdict() result, but this isn't always straightforward with an ORM instance. I actually discovered this behaviour using a SQLAlchemy model.

jcrist commented 5 months ago

Thanks for opening this.

First, standardizing on terminology. Given your field thing_id above, "thing_id" is the attribute name, "thingId" is the renamed name.

Originally from_attributes was implemented to always use the attribute names rather than the renamed names (so you'd get the behavior you wanted here). I ended up switching it to use the renamed names for "uniformity" in https://github.com/jcrist/msgspec/pull/431/commits/0f32524a6fbe264781800a4b7b33832200c98d97. Looking back I'm not sure if this was the best decision. Some random thoughts:

There are 3 kinds of things that may be converted into a struct/dataclass from by convert:

A dict. This is assumed to have come from some serialization protocol (the main point of msgspec). The renamed fields are always used, assuming that the renaming is done to match field naming conventions in that protocol.
A non-dict mapping. The assumption is this is some DB-related record (from this request). Here using the renamed fields may be nice to translate between the DB column names and the struct field names. But since you can alias the field names in a query maybe this should also always use the attribute names? Or maybe it should always use the renamed names since a mapping is dict-like and that's what dicts use?
An object with attributes, with from_attributes=True. The assumption is this is also some ORM-like DB-related record (from this request). Since this is by attribute the attributes must be valid attribute names - perhaps these should always use the attribute names? On the other hand diverging from 2 above which has the same use case may be confusing.

A few options:

Have objects-with-attributes always use the attribute names instead of the renamed names. This would be a breaking change, but we're still 0.* so those are acceptable when needed.
Have objects-with-attributes try the renamed names first then fallback to trying the attribute names before erroring (or maybe the other way around?). Maybe also apply to mappings? (I think no?). Right now I think this is my favorite option.
Add a new option to also accept the attribute names. This would be similar to pydantic's populate_by_name, but would probably be a kwarg to convert rather than an option on the struct type. Would this apply to dicts and mappings as well? I'm not sure. I'm reluctant to add a new option here, but could if needed.
Add a new option to use the attribute names instead of the renamed names. by_name=True or something. This wouldn't work well for nested types if there are realistic use cases where you might want to use the original names for some objects and the renamed names for others. Like the above, I'm reluctant to add a new config option here.

Thoughts?

LonelyVikingMichael commented 5 months ago

Hi Jim, thanks for taking the time to provide context.

In my own use cases (mostly REST APIs), I try to have everything internal to my application use snake_case, whereas the public JSON schemas use camelCase. However, this linked issue comment outlines that this isn't always the case.

As for solutions, having convert falling back on attribute names or vice versa would be fine by me, though this might be a problem in applications where performance is critical? There's probably no "right" choice between attribute names first or renamed names first, we can make valid arguments for both.

For what it is worth, Pydantic's by_alias/populate_by_name options have been helpful to me in the past, especially when integrating to third parties where naming is pre-defined, but I respect your stance on yet another config option.

jcrist commented 5 months ago

This is fixed by #636. I ended up going for the inference option - in practice the inference is pretty efficient and only results in at most 1 unnecessary getattr call. I opted to go for prioritizing the attribute names over the renamed names here, since in most cases converting an arbitrary object by attribute will want to use the attribute names.

jcrist / msgspec

Cannot convert with `from_attributes` when using a rename convention #630

Description