jcrist / msgspec

A fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML
https://jcristharif.com/msgspec/
BSD 3-Clause "New" or "Revised" License
2.01k stars 59 forks source link

fallback to dict for unknown type in tagged union? #691

Open tlambert03 opened 1 month ago

tlambert03 commented 1 month ago

Question

Hello, and thanks as always for the amazing library. I have a use case where I'm decoding a document with a huge amount of types, only a handful of which I care about. The schema uses tagged unions, and I'm hoping there is a way to to essentially ignore and/or simply leave as dict and objects that have an unrecognized type. As a simple example, I'd like to be able to deal with the "type": "Other" object below:

import msgspec

class Get(msgspec.Struct, tag=True):
    key: str

class Put(msgspec.Struct, tag=True):
    key: str
    val: str

msg = msgspec.json.encode(
    [
        {"type": "Put", "key": "my key1", "val": "my val"},
        {"type": "Get", "key": "my key2"},
        {"type": "Other", "somekey": "who knows"},
    ]
)
dec = msgspec.json.Decoder(list[Get | Put])
print(dec.decode(msg))
Traceback (most recent call last):
  File "/Users/talley/dev/self/slydb/y.py", line 23, in <module>
    print(dec.decode(msg))
          ^^^^^^^^^^^^^^^
msgspec.ValidationError: Invalid value 'Other' - at `$[2].type`

alternatives I have considered

I tried using something like msgspec.json.Decoder(list[Get | Put | dict]), but that results in:

TypeError: Type unions may not contain more than one dict-like type (`Struct`, `dict`, `TypedDict`, `dataclass`) - type `__main__.Get | __main__.Put | dict` is not supported

the only other thing I can think of is to (laboriously) define stub Structs for every key I ever encounter but don't care about. i.e. add:

class Other(msgspec.Struct, tag=True):
    ...
dec = msgspec.json.Decoder(list[Get | Put | Other])

and then hope i don't encounter something later that I haven't seen before...

tips?

dcwatson commented 1 month ago

Leaving Other empty seems to work:

class Other(msgspec.Struct, tag=True):
    pass

>>> msgspec.json.decode(msg, type=list[Get | Put | Other])
[Put(key='my key1', val='my val'), Get(key='my key2'), Other()]
tlambert03 commented 1 month ago

Yes it works, but it means that you need to know ahead of time the literal string of every type name you're ever going to encounter, even if you don't care about them (and would be happy to either leave them as unstructured dicts or empty structs)

(Assume that there are many additional names, not just Other)

dcwatson commented 1 month ago

Sorry, I misunderstood!