Closed matiasdahl closed 4 years ago
Please let me know if I can make a PR. However, I would need some guidance on what should be changed.
Using a fresh conda environment on Ubuntu 20.04.1 containing python 3.8.1 (from conda-forge), I tried installing msgpack 0.6.2, numpy 1.18.2, and msgpack-numpy 0.4.6.post0 with pip. I was unable to replicate the problem using this setup; when run directly with python at the console (i.e., without Jupyter), it executed successfully without raising any exception:
import msgpack
import msgpack_numpy as m
import numpy as np
x = np.random.rand(5)
x_enc = msgpack.packb(x, default=m.encode)
x_rec = msgpack.unpackb(x_enc, object_hook=m.decode)
np.testing.assert_array_equal(x, x_rec)
x = np.random.rand(5, 4000)
x_enc = msgpack.packb(x, default=m.encode)
x_rec = msgpack.unpackb(x_enc, object_hook=m.decode)
np.testing.assert_array_equal(x, x_rec)
The python binary provided by conda-forge is built with a different version of gcc than that in the docker image you are using, but I'm not sure why that would make a difference.
Incidentally, msgpack-numpy deliberately uses memoryview when possible to avoid the slight slowdown imposed by invocation of tobytes()
(which does make a difference in execution time when serialization/deserialization is performed repeatedly).
Out of curiosity, can you try using more recent versions of msgpack (1.0.0) and numpy (1.19.1) and see what happens?
Hi. Perfect! Yes, upgrading the libraries to msgpack (1.0.0) and numpy (1.19.1) fixed the issue. Now it works (both when running in python and jypyter) when running in docker. I also checked and all unit tests pass after upgrading.
I should have checked the library versions first. No idea why I was running old versions.
Thank you for you help, and thank you for this library :+1:
I am taking the liberty of closing this issue.
Great. It still is puzzling why I couldn't reproduce the issue with the exact same versions of the packages that you indicated. There is support for the pre-1.0.0 versions of msgpack in the code, but in light of your experience I think its time to make 1.0.0 a hard requirement.
Hi. There seems to be a problem with handling multidimensional numpy arrays. At least on linux and python 3.8.
To reproduce
The below example from the README works. We can serialize and deserialize 1d numpy arrays.
However, the above is a 1d-array. If we try to serialize a 2D (or higher dimensional) array the below code shows that the size of the serialized data does increase. However, when the data is deserialized, the result is somehow only an array of length 5 (?).
Here the last line fails with the below error
If I read the code for
msgpack-numpy
correctly, the below is essentially the serialize-deserialize logic. This does work, so this suggests that the problem in onmsgpack
:s side and how it handle:smemoryview
objects (?)Possible workarounds
The below shows that serialization/deserialization works when the memory layout is not C-contiguous (and then
obj.tobytes()
is used instead ofobj.data
), msgpack-numpy source:This suggests the following work-around
Then serialization for multidim arrays works with the same code as in the README
Unit tests
When running the unit tests (in the above docker environment) I get three failing tests and the first two seem to be related to the above issue.
Both
test_numpy_array_float_2d
andtest_numpy_array_float_2d_macos
pass ifndarray_to_bytes
is replaced with either of the above alternatives.This become rather long issue. Please let me know if I can provide more details or if something is unclear.