saalfeldlab / n5-zarr

Zarr filesystem backend for N5
BSD 2-Clause "Simplified" License
12 stars 14 forks source link

Set compressor to null in .zarray if RawCompressor is used #4

Closed kkoz closed 4 years ago

kkoz commented 4 years ago

If a .zarr file is written with N5ZarrWriter and RawCompressor is used, the resulting .zarray file will have "compressor":{"type":"raw"}. When Python Zarr attempts to open this dataset, it crashes with the error ValueError: codec not available: None. According to the Zarr specification, if no compression is being used, compressor should be set to null (https://zarr.readthedocs.io/en/stable/spec/v2.html#metadata). This PR modifies the way the .zarray file is written so that when RawCompressor is used, compressor is set to null in the .zarray file. The following piece of Java code will create a very simple Zarr dataset which has the compressor set to {"type":"raw"}:

public void testWriteRawCompression() throws IOException {
        String zarrPath = System.getProperty("user.home") + "/tmp/raw-zarr-test.zarr";
        String datasetName = "/test/data";
        final short[] dataBlockData = new short[]{1, 2, 3, 4, 5, 6};
        N5Writer n5 = new N5ZarrWriter(zarrPath);
        n5.createDataset(datasetName, new long[]{1, 2, 3}, new int[]{1, 2, 3}, DataType.UINT16, new RawCompression());
        final DatasetAttributes attributes = n5.getDatasetAttributes(datasetName);
        final ShortArrayDataBlock dataBlock = new ShortArrayDataBlock(new int[]{1, 2, 3}, new long[]{0, 0, 0}, dataBlockData);
        n5.writeBlock(datasetName, attributes, dataBlock);
    }

The error can be induced by opening Python and running:

>>> import zarr
>>> z = zarr.open('/home/user/tmp/raw-zarr-test.zarr', mode='r')
>>> z['test']['data'][:]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/kevin/code/gs-zarr-n5/python-utils/gs-zarr-venv/lib/python3.6/site-packages/zarr/hierarchy.py", line 340, in __getitem__
    synchronizer=self._synchronizer, cache_attrs=self.attrs.cache)
  File "/home/kevin/code/gs-zarr-n5/python-utils/gs-zarr-venv/lib/python3.6/site-packages/zarr/core.py", line 124, in __init__
    self._load_metadata()
  File "/home/kevin/code/gs-zarr-n5/python-utils/gs-zarr-venv/lib/python3.6/site-packages/zarr/core.py", line 141, in _load_metadata
    self._load_metadata_nosync()
  File "/home/kevin/code/gs-zarr-n5/python-utils/gs-zarr-venv/lib/python3.6/site-packages/zarr/core.py", line 169, in _load_metadata_nosync
    self._compressor = get_codec(config)
  File "/home/kevin/code/gs-zarr-n5/python-utils/gs-zarr-venv/lib/python3.6/site-packages/numcodecs/registry.py", line 36, in get_codec
    raise ValueError('codec not available: %r' % codec_id)
ValueError: codec not available: None

Running the same Java code with the PR yields:

>>> import zarr
>>> z = zarr.open('/home/user/tmp/raw-zarr-test.zarr', mode='r')
>>> z['test']['data'][:]
array([[[1],
        [2]],

       [[3],
        [4]],

       [[5],
        [6]]], dtype=uint16)
chris-allan commented 4 years ago

@axtimwalde / @igorpisarev: Is there anything more you'd like to see here?

/cc @joshmoore

axtimwalde commented 4 years ago

Nope, looks good, I just hadn't seen it, so thanks for the reminder!

axtimwalde commented 4 years ago

And released in 0.0.5, because I assume you want this ASAP?

chris-allan commented 4 years ago

And released in 0.0.5, because I assume you want this ASAP?

Not in any rush but thanks for doing that!