jcrist / msgspec

A fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML
https://jcristharif.com/msgspec/
BSD 3-Clause "New" or "Revised" License
2.32k stars 69 forks source link

Nested array_like=True #432

Closed ahopkins closed 1 year ago

ahopkins commented 1 year ago

Description

It seems that nested array_like=True is not working as expected.

Here's the scenario:

from msgspec import Struct
from msgspec.json import decode

class Point(Struct, array_like=True):
    lng: float
    lat: float

class Shape(Struct, array_like=True):
    points: list[Point]

single_point_data = b"[10,20]"
multiple_points = b"[[10,20],[30,40],[50,60]]"

single = decode(single_point_data, type=Point)  # << This is OK
print(single)

multiple = decode(multiple_points, type=Shape)  # << This fails ValidationError
print(multiple)

The output I receive:

Point(lng=10.0, lat=20.0)
Traceback (most recent call last):
  File "/tmp/p.py", line 19, in <module>
    multiple = decode(multiple_points, type=Shape)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
msgspec.ValidationError: Expected `array`, got `int` - at `$[0][0]`

As you can see, the Point is fine, but the list of points is not.

ahopkins commented 1 year ago

To work around this:

multiple = Shape([Point(*data) for data in decode(multiple_points)])

But, this seems like something that I should not have to do.

jcrist commented 1 year ago

I think there's some confusion here - array_like=True serializes structs like tuples. By your definition Shape would expect a length-1 array as input that contains an array of length-2 arrays (for the points) - the same as if it was of type tuple[list[Point]]. This works as expected:

from msgspec import Struct
from msgspec.json import decode

class Point(Struct, array_like=True):
    lng: float
    lat: float

class Shape(Struct, array_like=True):
    points: list[Point]

# indenting to make the nested structure clearer:
msg = b"""
[
    [
        [10,20],
        [30,40],
        [50,60]
    ]
]
"""

result = decode(msg, type=Shape)
print(result)
#> Shape(
#>   points=[
#>     Point(lng=10.0, lat=20.0),
#>     Point(lng=30.0, lat=40.0),
#>     Point(lng=50.0, lat=60.0)
#>   ]
#> )

To decode messages of the structure you have above, you'd decode them into a list[Point], no need to define a top-level object.

from msgspec import Struct
from msgspec.json import decode

class Point(Struct, array_like=True):
    lng: float
    lat: float

msg = b"""
[
    [10,20],
    [30,40],
    [50,60]
]
"""

result = decode(msg, type=list[Point])
print(result)
#> [
#>   Point(lng=10.0, lat=20.0),
#>   Point(lng=30.0, lat=40.0),
#>   Point(lng=50.0, lat=60.0)
#> ]
ahopkins commented 1 year ago

Ahh, I see. Thanks so much for the quick response. Sorry about that. Thanks for the awesome project. It's been a lot of fun playing with and I look forward to using it more.

jcrist commented 1 year ago

No worries, and thanks for the kind words! Please let me know if you run into any further questions or issues.

ahopkins commented 1 year ago

@jcrist :laughing: Just realized you posted on this PR: https://github.com/sanic-org/sanic-ext/pull/197

Thanks. I feel like the ability of not only handling the modeling, but also the deserialization calls for a deeper integration, but I'm still working out in my head how to accomplish that.