lebedov / msgpack-numpy

Serialize numpy arrays using msgpack
Other
197 stars 33 forks source link
msgpack numpy python

Numpy Data Type Serialization Using Msgpack

Package Description

This package provides encoding and decoding routines that enable the serialization and deserialization of numerical and array data types provided by numpy using the highly efficient msgpack format. Serialization of Python's native complex data types is also supported.

Latest Version Build Status

Installation

msgpack-numpy requires msgpack-python and numpy. If you have pip installed on your system, run

pip install msgpack-numpy

to install the package and all dependencies. You can also download the source tarball, unpack it, and run

python setup.py install

from within the source directory.

Usage

The easiest way to use msgpack-numpy is to call its monkey patching function after importing the Python msgpack package:

import msgpack
import msgpack_numpy as m
m.patch()

This will automatically force all msgpack serialization and deserialization routines (and other packages that use them) to become numpy-aware. Of course, one can also manually pass the encoder and decoder provided by msgpack-numpy to the msgpack routines:

import msgpack
import msgpack_numpy as m
import numpy as np

x = np.random.rand(5)
x_enc = msgpack.packb(x, default=m.encode)
x_rec = msgpack.unpackb(x_enc, object_hook=m.decode)

msgpack-numpy will try to use the binary (fast) extension in msgpack by default.
If msgpack was not compiled with Cython (or if the MSGPACK_PUREPYTHON variable is set), it will fall back to using the slower pure Python msgpack implementation.

Notes

The primary design goal of msgpack-numpy is ensuring preservation of numerical data types during msgpack serialization and deserialization. Inclusion of type information in the serialized data necessarily incurs some storage overhead; if preservation of type information is not needed, one may be able to avoid some of this overhead by writing a custom encoder/decoder pair that produces more efficient serializations for those specific use cases.

Numpy arrays with a dtype of 'O' are serialized/deserialized using pickle as a fallback solution to enable msgpack-numpy to handle such arrays. As the additional overhead of pickle serialization negates one of the reasons to use msgpack, it may be advisable to either write a custom encoder/decoder to handle the specific use case efficiently or else not bother using msgpack-numpy.

Note that numpy arrays deserialized by msgpack-numpy are read-only and must be copied if they are to be modified.

Development

The latest source code can be obtained from GitHub.

msgpack-numpy maintains compatibility with python versions 2.7 and 3.5+.

Install tox to support testing across multiple python versions in your development environment. If you use conda to install python use tox-conda to automatically manage testing across all supported python versions.

# Using a system python
pip install tox

# Additionally, using a conda-provided python
pip install tox tox-conda

Execute tests across supported python versions:

tox

Authors

See the included AUTHORS.md file for more information.

License

This software is licensed under the BSD License. See the included LICENSE.md file for more information.

See Also

msgpack-numpy-js by Egil Möller is a JavaScript library that can serialize and deserialize JS typed arrays with msgpack.

msgpack-numpy-rs by Terrence Liu is a Rust crate (mostly) compatible with msgpack-numpy that can serialize and deserialize numpy arrays with msgpack.