lebedov / msgpack-numpy

Serialize numpy arrays using msgpack
Other
194 stars 33 forks source link

UnicodeDecodeError when decoding Numpy types #39

Closed Tronic closed 4 years ago

Tronic commented 4 years ago

This module should not override the msgpack default parameters with use_bin_type=0 and raw=True like it does. This causes UnicodeDecodeErrors and will also mix up str and bytes types elsewhere. Manually specifying use_bin_type=True and raw=False avoids the problems:

In [1]: import msgpack_numpy as mp, numpy as np                                                                                                          

In [2]: mp.packb(np.array([-0.0]))                                                                                                                       
Out[2]: b'\x85\xa2nd\xc3\xa4type\xa3<f8\xa4kind\xa0\xa5shape\x91\x01\xa4data\xa8\x00\x00\x00\x00\x00\x00\x00\x80'

In [3]: mp.unpackb(_)                                                                                                                                    
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 7: invalid start byte

In [4]: mp.packb(np.array([-0.0]), use_bin_type=True)                                                                                                    
Out[4]: b'\x85\xc4\x02nd\xc3\xc4\x04type\xa3<f8\xc4\x04kind\xc4\x00\xc4\x05shape\x91\x01\xc4\x04data\xc4\x08\x00\x00\x00\x00\x00\x00\x00\x80'

In [5]: mp.unpackb(_, raw=False)                                                                                                                         
Out[5]: array([-0.])
lebedov commented 4 years ago

The msgpack defaults were explicitly changed in msgpack-numpy to provide more seamless behavior for both Python 2 and 3 in light of the differences in how they each handle string types. Given that Python 2 is now EOL, it seems safer to revert to the msgpack defaults.