ilastik / lazyflow

lazy parallel ondemand zero copy numpy array data flows with caching and dirty propagation
Other
77 stars 59 forks source link

Fix scikit-learn classifier HDF5 serialization #328

Closed emilmelnikov closed 5 years ago

emilmelnikov commented 5 years ago

From https://h5py.readthedocs.io/en/stable/strings.html#how-to-store-raw-binary-data:

If you have a non-text blob in a Python byte string (as opposed to ASCII or UTF-8 encoded text, which is fine), you should wrap it in a void type for storage. This will map to the HDF5 OPAQUE datatype, and will prevent your blob from getting mangled by the string machinery.

Here’s an example of how to store binary data in an attribute, and then recover it:

binary_blob = b"Hello\x00Hello\x00"
dset.attrs["attribute_name"] = np.void(binary_blob)
out = dset.attrs["attribute_name"]
binary_blob = out.tostring()

Closes https://github.com/ilastik/ilastik/issues/1997.