Open davfsa opened 1 year ago
Is msgspec.json.decode(msg, type=Foo)
what you want?
Is
msgspec.json.decode(msg, type=Foo)
what you want?
Yeah, would be nice to be able to do msgspec.json.decode(msg, type=Foo)
and it be aware that options can be found inside the data
field and extracted off there
Hi! Support for flattening structs would be hard. It's doable, but not easily - there's a bunch of edge cases that can pop up as features are mixed together. I'd be happy to write up what makes this hard if you're interested, but in short I don't have plans to add this feature.
That said, I'm curious about your use case. Why do you want to flatten the runtime structure here? Why not write out the full structure of Foo
matching how it's serialized?
In [8]: class Option(msgspec.Struct):
...: x: int # made up some fields for here
...:
In [9]: class Data(msgspec.Struct):
...: options: list[Option]
...:
In [10]: class Foo(msgspec.Struct):
...: version: int
...: data: Data
...:
In [11]: msg = """
...: {
...: "version": 1,
...: "data": {
...: "options": [{"x": 1}, {"x": 2}]
...: }
...: }
...: """
In [12]: msgspec.json.decode(msg, type=Foo)
Out[12]: Foo(version=1, data=Data(options=[Option(x=1), Option(x=2)]))
Thanks for the answer!
The reason for this is mostly because of an opinionated approach to an API wrapper I am working on. The data field for this payload feels a bit cluncky and useless, as it doesn't really contain much, but just makes things harder to access, specially due to Options containing more Options:
obj.data.options.data.options
# vs
obj.options.options
It was a choice we went with when implementing this part of the API for simplicity sake.
When I opened the issue my idea for this was something along the lines of:
class Foo(msgspec.Struct):
version: int
option: Option = msgspec.field(location="data__option")
a little side effect here would also be allowing a syntax to rename attributes
For some quick dump of info because this idea has been coming and going in my head, the syntax would go something like this:
data__option
would signify option=payload["data"]["option"]
data[0]
would be option = payload["data"][0]
data[0]__option
would be option = payload["data"][0]["option"]
Which I believe should cover all usecases for this.
A tricky case I also thought about would be:
{
"data": {
"option": {}
},
"data__": {
"option": {}
}
}
class Obj(msgspec.Struct):
data: Data
data__: MoreData
data_option: Option = msgspec.field(location="data__option")
more_data_option: Option = msgspec.field(location="data____option")
# or (which would be equivalent)
some_data: Data = msgspec.field(location="data")
some_more_data: Data = msgspec.field(location="data__")
data_option: Option = msgspec.field(location="data__option")
more_data_option: Option = msgspec.field(location="data____option")
In this case, the data fields will properly resolve and the distinction between flattening the stuct or not will be dictated based on whether the key exists or not, taking priority the first one.
For extreme cases that I don't believe can really be found in the wild, an extra arg to force a location to be treated as a flattenener could be added too.
I understand this could be a lot more work than is actually usefully, but I just wanted to dump the idea. I unfortunately don't have the C skills to try and implemt this myself, but would love to try.
Also interested in the limitations that you mentioned, as they might render my whole idea useless, as lack information on the internals of msgspec :sweat_smile:
The rename mechanism could be probably used for this (from the point of view of the user of the lib), something like this:
class Option(msgspec.Struct):
x: int
foo_names= {
"options": ["data", "options"], # for example, TBD
}
class Foo(msgspec.Struct, rename=foo_names):
version: int
options: list[Option]
AFAIK Pydantic will support flattening in V2:
class Foo(BaseModel):
bar: str = Field(aliases=[['baz', 2, 'qux']])
They have probably thought about edge cases, so it might be worth looking into as a good starting point.
@davfsa If it's only about making the parsed objects more usable, what about simply:
class Foo(msgspec.Struct):
version: int
data: ...
@property
def options(self):
return self.data.options
You might even hide the original data field, having it renamed to e.g. _data
.
I have a similar use case. This is what the data looks like:
{
"username": "jcrist",
"attributes": [
{"Name": "first_name", "Value": "Jim"},
{"Name": "last_name", "Value": "Crist"},
...
]
...
}
I'd like to model it such that the attribute keys (like first_name
) and the corresponding Value
s are attributes of the Struct
and also type validated. That is,
class User(Struct):
username: str
first_name: str
last_name: str
msgspec.json.decode(data, type=User)
# > MyUser(username='jcrist', first_name='Jim', last_name='Crist')
Even if I created a new Attribute
struct and set attributes: list[Attribute]
, there's no (obvious) way to validate the type of the Value
based on what the Name
is.
(PS: Not sure if this is the right issue to ask this; it seemed very similar to mine, but also slightly different because there's a level of...indirection(?), where the relevant key-value pairs are 'hidden' under the Name
and Value
keys of the list of dicts. Let me know if I should create a new issue instead.)
For reference, I found a solution to a similar problem using Pydantic's @root_validator(pre=True)
decorator. [Stack Overflow comment, example code]
Also met a similar case, I think these schema of data would happens frequently at a GraphQL API.
{
"data":{
"issues":{
"nodes":[
{
"id":"12345"
},
{
"id":"67890"
}
]
}
}
}
Thanks for @ml31415 that https://github.com/jcrist/msgspec/issues/315#issuecomment-1572238202 helps a lot, but I still need to define 4 one-line-structs to express it. I would be really grateful if there could be a native support.
@mjkanji
What you could do is create tagged attribute objects. Then msgspec can distinguish them and you can add some verification.
class Attribute(msgspec.Struct, tag_field="Name")
pass
class Firstname(Attribute, tag="first_name"):
Value: str # add validation for first_name here as required
class Lastname(Attribute, tag="last_name"):
Value: str # separate validation for last_name goes
Attribute = Firstname | Lastname
class User(msgspec.Struct):
username: str
attributes: list[Attribute]
Otherwise, if it's just about making the object easier to access, instead of modifying the data, just again use property. Roughly like that:
class User(msgspec.Struct):
username: str
attributes: list[Attribute]
def _attribute_dict(self):
return {attr.Name.lower(): attr.Value for attr in self.attributes}
def __getattr__(self, attr):
try:
return self._attribute_dict()[attr]
except KeyError:
raise AttributeError(attr)
Hi @cutecutecat , if you don't care about further fields of "data" and "issues", just go with ordinary dictionaries and happily nest the type definition:
from typing import Literal
class Node(msgspec.Struct):
id: int
class Container(msgspec.Struct):
data: dict[Literal["issues"], dict[Literal["nodes"], list[Node]]]
@property
def nodes(self):
return self.data["issues"]["nodes"]
>>> container = msgspec.json.decode(data, type=Container, strict=False)
>>> container.nodes
[Node(id=12345), Node(id=67890)]
I'm currently working on a Docker API client and flattening would be really useful.
For example, we have a struct like this:
class ServiceSpec(Struct):
name: str
labels: dict[str, str]
image: str
environment: list[str]
And Docker expects something like this:
{
"Name": "web",
"Labels": {"com.docker.example": "string"},
"TaskTemplate": {
"ContainerSpec": {
"Image": "nginx:alpine",
"Env": ["SECRET_KEY=123"]
}
}
}
To achieve this, I currently use the following hack:
This is a bit clumsy, but works out fairly well:
>>> spec = ServiceSpec(
... name="app",
... labels={},
... image="nginx:alpine",
... environment=["HELLO=world"]
... )
>>> msgspec.json.encode(DockerService.from_spec(spec))
b'{"Name":"app","Labels":{},"TaskTemplate":{"ContainerSpec":{"Image":"nginx:alpine","Env":["HELLO=world"]}}}'
UPD: this can be refactored as a wrapper for msgspec.convert
: https://gist.github.com/notpushkin/3639f45acd2aa053b9d2416375135045
(see example at the bottom)
Description
This is more of a question than a feature request, but could turn into one.
One of my uses when it comes to deserialising something similar to:
into
I have scoured through the documentation and can't find an easy way to do this. The way I have managed currently is by deserialising the Struct to a dict and then parsing the JSON as a dict (using attrs), but would like to move away from it to reduce the amount of code to maintain (the reason I have been looking at msgspec, appart from the obvious speed gains!)
Thanks!