zarr-developers / zarr-python

An implementation of chunked, compressed, N-dimensional arrays for Python.
https://zarr.readthedocs.io
MIT License
1.47k stars 274 forks source link

Unable to save object arrays #691

Open imilas opened 3 years ago

imilas commented 3 years ago

Issue

I am unable to save variable length arrays to disk. I also cannot save JSON or Pickle object arrays to disk. Simple examples are given below.

Sorry if I missed something obvious in the documents. I cannot find an example of object arrays being saved to disk, so this could be entirely a syntax issue. Thanks for your time.

Example 1:

z = zarr.empty(4, dtype=object, object_codec=numcodecs.VLenArray(int))
print(z)
print(z.filters)
z[0] = np.array([1, 3, 5])
z[1] = np.array([4])
z[2] = np.array([7, 9, 14])
zarr.save("test1.zarr",z)

Example 2:

z = zarr.empty(5, dtype=object, object_codec=numcodecs.JSON())
print(z)
print(z.filters)
z[0] = 42
z[1] = 'foo'
z[2] = ['bar', 'baz', 'qux']
z[3] = {'a': 1, 'b': 2.2}
zarr.save("test2.zarr",z)

Problem description

Saving object arrays (as defined in the tutorial) results in the "ValueError: missing object_codec for object array" error.

The error occurs in the following line:

~/miniconda3/lib/python3.8/site-packages/zarr/storage.py in _init_array_metadata(store, shape, chunks, dtype, compressor, fill_value, order, overwrite, path, chunk_store, filters, object_codec)
    386             if not filters:
    387                 # there are no filters so we can be sure there is no object codec
--> 388                 raise ValueError('missing object_codec for object array')
    389             else:
    390                 # one of the filters may be an object codec, issue a warning rather

As far as I can tell, the filters and codecs are defined in both cases. Is it possible to save the arrays defined above to disk?

Version and installation information

zarr.version: 2.4.0 numcodecs.version:0.7.2 python version : Python 3.8.5 OS: linux both pip and conda installations were tested

rocherroche commented 2 years ago

any solution to this?

joshmoore commented 2 years ago

Hi @rocherroche. There was recently a fix (#813 in 2.9.4) What version are you using & are you seeing the identical error?

TomasPuverle commented 2 years ago

I can reproduce the above with 2.12.0 and also with the pickle codec. As noted in another issue, things work if the array is created as part of the specified store; if not and the array is just assigned, or copied into the destination using zarr.copy the above error (or a warning in the case of zarr.copy) occurs.

Are there any known/recommended workarounds?

Thank you!

TomasPuverle commented 2 years ago

Hi, I wanted to check back in to see if anyone has any suggestions/workarounds for this problem. I am seeing this happening with other encoder classes, too. Thank you.

joshmoore commented 2 years ago

As noted in another issue, things work if the array is created as part of the specified store

Do you mean https://github.com/zarr-developers/zarr-python/issues/1090#issuecomment-1190314533? If so, I guess that makes sense. Is that usage not currently possible for you? Can you share your code?