jcrist / msgspec

A fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML
https://jcristharif.com/msgspec/
BSD 3-Clause "New" or "Revised" License
2.01k stars 59 forks source link

Support types.MappingProxyType #632

Open intentionally-left-nil opened 5 months ago

intentionally-left-nil commented 5 months ago

Description

For creating immutable types, msgspec already supports serializing/parsing: frozenset, tuple, dataclass(frozen=True).

The only primary data type missing here is a dict. The immutable variant of a dict is MappingProxyType.

Thanks to the extensibility, it's possible today to create a custom encode/decoder for this type. However, it would be great to include this as a supported, builtin type

intentionally-left-nil commented 5 months ago

In case it's helpful, here's my current workaround.

    @classmethod
    def enc_hook(cls, obj: Any) -> Any:
        if isinstance(obj, MappingProxyType):
            return obj.copy()
        else:
            raise NotImplementedError(f"Unknown type: {type(obj)}")

    @classmethod
    def dec_hook(cls, type: Type, obj: Any) -> Any:
        if type is MappingProxyType or get_origin(type) is MappingProxyType:
            args = get_args(type)
            if len(args) == 2:
                key_type: Any = args[0]
                value_type: Any = args[1]
                return MappingProxyType(msgspec.convert(obj, Dict[key_type, value_type]), dec_hook=cls.dec_hook)
            else:
                return MappingProxyType(msgspec.convert(obj, dict, dec_hook=cls.dec_hook))

        raise NotImplementedError(f"Unknown type: {type(obj)}")
jcrist commented 5 months ago

Thanks for opening this. Your workaround as posted is how I'd handle this today.

For encoding I'd expect only the slightest of speedups if we supported this natively. MappingProxyType doesn't expose a native API, so the only difference is the call to .copy() would be made a bit quicker. Supporting this for encoding builtin is very easy to do though.

For decoding, native support could be made quicker since we could avoid the copies and 2nd traversal done by calling msgspec.convert. Supporting this for decoding is less easy (there's more plumbing needed here) but still doable.

That said, types.MappingProxyType is a fairly uncommon type to use. Adding additional builtin types increases the maintenance burden on msgspec, generally we only add types that are common or can be handled significantly faster when supported as builtins.

Can you say more about why you're trying to use a MappingProxyType? Due to how they're implemented, MappingProxyType objects will always be slower to create, access, encode, and decode. The only thing they give you is pseudo-immutability (and then only if the proxied dict isn't accessible elsewhere).

intentionally-left-nil commented 5 months ago

Hi @jcrist, thanks for taking a look.

Adding additional builtin types increases the maintenance burden on msgspec, generally we only add types that are common or can be handled significantly faster when supported as builtins.

Given this, I think it's completely reasonable to not implement this suggestion. I also can't make any arguments that MappingProxyType is widely used, I only saw it in some relatively obscure posts about immutability.

Can you say more about why you're trying to use a MappingProxyType?

Once-deserialized, I'm using Immutability to detect changes to a nested data structure. E.g. I don't need to use ==, and have that recurse down the whole tree, but instead I can use object identity at the root level (or any sub-leaf I'm interested in). It's also trivial for me to implement an onchange detection, because I only need to override the setter property at the root level, and that will catch any change anywhere in the entire tree.

That requires callers to remember to create new copies rather than updating existing ones when making changes, hence the requirement for immutability.

There probably are other options, such as subclassing UserDict and overriding __setitem__ but then the static type checker is blissfully unaware. MappingProxyType seemed to be the simplest solution. The only challenge I ran into was that serializing the structure to disk with msgspec (and then re-parsing it) doesn't work.