jcrist / msgspec

A fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML
https://jcristharif.com/msgspec/
BSD 3-Clause "New" or "Revised" License
2.01k stars 59 forks source link

Infer naming convention when converting objects to structs #636

Closed jcrist closed 5 months ago

jcrist commented 5 months ago

Struct types support renaming of fields for encoding/decoding. A common use of this is to enforce a camelCase convention in the serialized format:

import msgspec

class Example(msgspec.Struct, rename="camel"):
    field_one: int
    field_two: int

x = Example(1, 2)
print(msgspec.json.encode(x))
#> b'{"fieldOne":1,"fieldTwo":2}

Previously when converting an object to a struct we'd always use the renamed field names rather than the original names. This was true whether the input was a dict, a non-dict mapping, mapping, or an arbitrary object via attributes if from_attributes=True. The latter two inputs will rarely/never occur when coming from a serialization framework, but are more commonly used with database/ORM-like objects. In this case it's more likely that the original attribute names are more useful, as both the database and struct object representations are internal to the application (unlike the serialized names which may have to match some external convention like camelCase).

We now infer the intended naming schem when a non-dict mapping or object is passed to msgspec.convert to convert to a msgspec.Struct type. The inference process is as follows:

A key point here is that inputs may not mix attribute and renamed names together - the inference process will decide to use either only one or the other depending on what names are present. Using Example above:

The overhead of this inference process is low - at worst only one excess getattr call is made to determine whether to use the original or renamed names.

To reiterate, this change only affects object (non-dict mapping or arbitrary object) inputs to msgspec.convert when converting to a Struct type. Inputs of other types like dict are still assumed to have come from a serialization protocol and will always use the renamed names.

Fixes #630.