Closed spacemanspiff2007 closed 1 year ago
In you input
you have payload
key with string value, because it in double quotes. Just remove double quotes for payload
key value
But input
is how I would receive the data from the openHAB application which means it's not possible to edit the data by hand.
So I'm wondering if there is any way to indicate that the value of payload
is a json string which should be deserialized, too.
In that case I would try customization decoding with this - https://jcristharif.com/msgspec/extending.html?
Hi, apologies for the delayed reply here. This is definitely doable using the existing extension support. One way would be to define a new generic type for handling JSON-in-JSON, then use it to wrap the payload values. Something like:
import msgspec
from typing import Generic, TypeVar
T = TypeVar("T")
class JSONStr(Generic[T]):
"""A wrapper type for handling JSON-in-JSON values"""
value: T
def __init__(self, value: T):
self.value = value
def __repr__(self) -> str:
return f"JSONStr({self.value})"
class ItemStateEvent(msgspec.Struct):
type: str
value: str
class BaseMsg(msgspec.Struct):
type: str
topic: str
payload: JSONStr[ItemStateEvent]
def enc_hook(x):
if isinstance(x, JSONStr):
return msgspec.json.encode(x.value).decode("utf-8")
raise TypeError(f"{type(x).__name__} is not supported")
def dec_hook(type, value):
if getattr(type, "__origin__", None) is JSONStr:
inner_type = type.__args__[0]
return JSONStr(msgspec.json.decode(value, type=inner_type))
raise TypeError(f"{type} is not supported")
encoder = msgspec.json.Encoder(enc_hook=enc_hook)
decoder = msgspec.json.Decoder(BaseMsg, dec_hook=dec_hook)
msg = (
b'{"type":"ItemStateEvent","topic":"openhab/items/DTR/state",'
b'"payload":"{\\"type\\":\\"Quantity\\",\\"value\\":\\"5MB/s\\"}"}'
)
res = decoder.decode(msg)
print(res)
#> BaseMsg(
#> type='ItemStateEvent',
#> topic='openhab/items/DTR/state',
#> payload=JSONStr(ItemStateEvent(type='Quantity', value='5MB/s'))
#> )
msg2 = encoder.encode(res)
assert msg == msg2
Hi, apologies for the delayed reply here.
No worries - you made it worth the wait with your detailed answer with a working example. Thank you very much for that!
I tried playing around with the dec_hook
, too.
From the docs I would have expected something like this to work
import msgspec
class ItemStateEvent(msgspec.Struct):
type: str
value: str
class BaseMsg(msgspec.Struct):
type: str
topic: str
payload: ItemStateEvent
def dec_hook(type, value):
if type is ItemStateEvent:
return ItemStateEvent(msgspec.json.decode(value, type=ItemStateEvent))
raise TypeError(f"{type} is not supported")
decoder = msgspec.json.Decoder(BaseMsg, dec_hook=dec_hook)
msg = (
'{"type":"ItemStateEvent","topic":"openhab/items/DTR/state",'
'"payload":"{\\"type\\":\\"Quantity\\",\\"value\\":\\"5MB/s\\"}"}'
)
res = decoder.decode(msg)
print(res)
however this again raises the exception
msgspec.ValidationError: Expected `object`, got `str` - at `$.payload`
The solution you proposed unfortunately does not work for me:
Since there are lots of different kinds of messages using the .value
doesn't provide much benefit since I would have to narrow the type based on the type field of the base msg. That would put mean I need to implement the corresponding logic everywhere I intend to consume the events.
Maybe I should have made that more clear - sorry.
There are many events and they all are wrapped in the type
, topic
payload
json and I would love to put as much logic as possible into the message definition.
Do you have any more ideas?
It would have been really nice if there would have been a way to indicate to deserialize the payload
field on the model or on the decoder because I could have used tagged unions since the type information is in msg.type
, e.g.
class ItemStateEventPayload(msgspec.Struct, tag=False):
type: str
value: str
class ItemStateEventMsg(msgspec.Struct, tag='ItemStateEvent'):
topic: str
payload: ItemStateEventPayload
class AnotherMsg(msgspec.Struct, tag='AnotherMsg'):
...
class AThirdMsg(msgspec.Struct, tag='AThirdMsg'):
...
decoder = msgspec.json.Decoder(ItemStateEventMsg | AnotherMsg | AThirdMsg)
msg = (
'{"type":"ItemStateEvent","topic":"openhab/items/DTR/state",'
'"payload":"{\\"type\\":\\"Quantity\\",\\"value\\":\\"5MB/s\\"}"}'
)
res = decoder.decode(msg)
print(res)
That way I could have have the whole deserialisation logic offloaded onto msgspec which I hoped would be much faster and less error prone than my python code.
Sure. There are two ways I can think of to handle this kind of structure, depending on how you want to work with the output data.
I like option 2 the best as it's simpler, but they're both functional.
In this method you have a different top-level type per each event. This means twice as many types to define (one per "payload" type, with an additional wrapper "event" type for each). To get type annotations to work properly in this version you have to do a bit of magic, especially if you want to hide the existence of the JSONStr
wrapper class. Whether this magic is worth it is up to you.
Note that this version relies on Generic Struct types, which exist on the main
branch but haven't been released yet.
This version uses a single top-level Message
type, parametrized by one of a number of Payload
types. I like this version a lot better than the first as its simpler and requires less type magic to make mypy/pyright happy.
Closing as stale/resolved. Please comment/open a new issue if you have more questions.
I am currently thinking about using msgspec to use as a web socket deserializer for HABApp. However the openHAB project uses an unconventional json-in-json approach (see
payload
field). I tried modeling it accordingly but obviously I'm getting an error:Is there any way how I can indicate that the payload field shall be deserialized after/during the BaseMsg deserialization?