msgpack / msgpack-python

MessagePack serializer implementation for Python msgpack.org[Python]
https://msgpack.org/
Other
1.92k stars 230 forks source link

Multi-dimensional memoryview breaks packing #526

Closed pohlt closed 1 year ago

pohlt commented 1 year ago

NumPy 1.24 seems to use multi-dimensional memoryviews when getting array data with array.data. msgpack does not play nicely with such multi-dimensional views.

Example

Here's a crashing example without NumPy, but creating a multi-dimensional memoryview by hand:

import msgpack

def mutli_dim_memoryview():

    view = memoryview(b"\00" * 6)
    for data in [
        view,  # fine
        view.cast(view.format, (3, 2))  # crashes when unpacking
    ]:
        print("view shape:", data.shape)
        packed = msgpack.packb(data)
        print("packed:", packed)
        unpacked = msgpack.unpackb(packed)
        print("unpacked:", unpacked)

Output

view shape: (6,)
packed: b'\xc4\x06\x00\x00\x00\x00\x00\x00'
unpacked: b'\x00\x00\x00\x00\x00\x00'
view shape: (3, 2)
packed: b'\xc4\x03\x00\x00\x00\x00\x00\x00'
Traceback (most recent call last):
    [...]
    unpacked = msgpack.unpackb(packed)
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xxx/lib64/python3.11/site-packages/msgpack/fallback.py", line 133, in unpackb
    raise ExtraData(ret, unpacker._get_extradata())
msgpack.exceptions.ExtraData: unpack(b) received extra data.

Mitigations

Of course, it would be great if msgpack could do that internally when packing a memoryview.

Thanks for this great package! ❤️

jfolz commented 1 year ago

The packer actually requests a buffer object with the PyBUF_SIMPLE flag, so there should be no shape, strides, or suboffsets. See here. The API also says that "Without stride information, the buffer must be C-contiguous." As long as I'm not misunderstanding something here Numpy should either return a 1D C-contiguous buffer (possibly creating a copy of non-contiguous arrays first), or return an error if it can't do that. So maybe Numpy is at fault here?

pohlt commented 1 year ago

@jfolz, I don't fully understand your comment.

When packing a NumPy array (in my code not shown here), I explicilty ask for the memoryview using ndarray.data, so I don't blame NumPy for returning a multi-dimensional memoryview.

But even without NumPy: msgpack packs multi-dimensional memoryviews incorrectly as shown in the example, wouldn't you agree?

methane commented 1 year ago

I can not reproduce.

$ python -VV
Python 3.9.7 (default, Sep 16 2021, 13:09:58)
[GCC 7.5.0]

$ pip freeze | grep msgpack
msgpack==1.0.4

$ cat x.py
import msgpack

def multi_dim_memoryview():

    view = memoryview(b"\00" * 6)
    for data in [
        view,  # fine
        view.cast(view.format, (3, 2))  # crashes when unpacking
    ]:
        print("view shape:", data.shape)
        packed = msgpack.packb(data)
        print("packed:", packed)
        unpacked = msgpack.unpackb(packed)
        print("unpacked:", unpacked)

multi_dim_memoryview()

$ python x.py
view shape: (6,)
packed: b'\xc4\x06\x00\x00\x00\x00\x00\x00'
unpacked: b'\x00\x00\x00\x00\x00\x00'
view shape: (3, 2)
packed: b'\xc4\x06\x00\x00\x00\x00\x00\x00'
unpacked: b'\x00\x00\x00\x00\x00\x00'
methane commented 1 year ago

I tested Python 3.10 and 3.11 but output is same.

jfolz commented 1 year ago

When packing a NumPy array (in my code not shown here), I explicilty ask for the memoryview using ndarray.data, so I don't blame NumPy for returning a multi-dimensional memoryview.

You're right, I misunderstood what you were trying to do. Numpy is not to blame, as it's job is done after you get the memoryview object out of it. My point still stands though, if the memoryview cannot produce a nice and simple contiguous buffer object it should return an error and packing should in turn fail.

pohlt commented 1 year ago

@methane , that's weird. I reproduced your steps above in a fresh venv with only msgpack installed:

$ uname -a
Linux big 6.0.18-300.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Sat Jan 7 17:10:00 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

$ python -VV
Python 3.11.1 (main, Dec  7 2022, 00:00:00) [GCC 12.2.1 20221121 (Red Hat 12.2.1-4)]

$ pip freeze | grep msgpack
msgpack==1.0.4

$ cat msgpack_test.py 
import msgpack

def multi_dim_memoryview():
    view = memoryview(b"\00" * 6)
    for data in [
      view,  # fine
      view.cast(view.format, (3, 2))  # crashes when unpacking
    ]:
        print("view shape:", data.shape)
        packed = msgpack.packb(data)
        print("packed:", packed)
        unpacked = msgpack.unpackb(packed)
        print("unpacked:", unpacked)

multi_dim_memoryview()

$ python msgpack_test.py 
view shape: (6,)
packed: b'\xc4\x06\x00\x00\x00\x00\x00\x00'
unpacked: b'\x00\x00\x00\x00\x00\x00'
view shape: (3, 2)
packed: b'\xc4\x03\x00\x00\x00\x00\x00\x00'
Traceback (most recent call last):
  File "/home/xxx/msgpack_test.py", line 15, in <module>
    multi_dim_memoryview()
  File "/home/xxx/msgpack_test.py", line 11, in multi_dim_memoryview
    unpacked = msgpack.unpackb(packed)
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xxx/venv/lib64/python3.11/site-packages/msgpack/fallback.py", line 133, in unpackb
    raise ExtraData(ret, unpacker._get_extradata())
msgpack.exceptions.ExtraData: unpack(b) received extra data.

As you can see, the bin array has a correct size of 6 in the first case and an incorrect size 3 (the size of the first dimension) in the second case.

pohlt commented 1 year ago

@jfolz The memoryview is contiguous, but it's multi-dimensional. The question is what msgpack should do with multi-dimensional memoryviews:

Could be another configuration parameter.

methane commented 1 year ago

Now I got it. Fallback module packs it wrong.

(venv) [root@978f0846d90b ~]# MSGPACK_PUREPYTHON=1 python mptest.py
view shape: (6,)
packed: b'\xc4\x06\x00\x00\x00\x00\x00\x00'
unpacked: b'\x00\x00\x00\x00\x00\x00'
view shape: (3, 2)
packed: b'\xc4\x03\x00\x00\x00\x00\x00\x00'
Traceback (most recent call last):
  File "/root/mptest.py", line 15, in <module>
    multi_dim_memoryview()
  File "/root/mptest.py", line 12, in multi_dim_memoryview
    unpacked = msgpack.unpackb(packed)
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/venv/lib64/python3.11/site-packages/msgpack/fallback.py", line 133, in unpackb
    raise ExtraData(ret, unpacker._get_extradata())
msgpack.exceptions.ExtraData: unpack(b) received extra data.
methane commented 1 year ago

I released 1.0.5rc1. Please try it.

pohlt commented 1 year ago

Issue is fixed with 1.0.5rc1. Thanks a lot!

pohlt commented 1 year ago

But since you now provide wheels for 3.11, I'm no longer using the fallback code, I guess.