[Bug]: Zarr 2.18.0 with Blosc

CodyCBakerPhD commented 5 months ago

What happened?

Just encountered test failure in NeuroConv due to latest Zarr release on May 7

Full log: https://github.com/catalystneuro/neuroconv/actions/runs/9005172770/job/24739878400

Including test case below, show have most of what you need to reproduce

Wanted to check if this has anything to do with how the file is being read on hdmf-zarr side, or otherwise just let y'all be aware of the issue

Steps to Reproduce

tmpdir = local('/tmp/pytest-of-runner/pytest-0/popen-gw1/test_simple_time_series_zarr_g0')
integer_array = array([[   606,  22977,  27598, ...,  21453,  14831,  29962],
       [-26530,  -9155,  -6666, ...,  18490,  -6943,   1...5954, -21319, ...,  -8983, -30074, -24446],
       [-30841, -12815,  28599, ...,  24069, -15762,  -3284]], dtype=int16)
case_name = 'generic'
iterator = <class 'neuroconv.tools.hdmf.SliceableDataChunkIterator'>
iterator_options = {}, backend = 'zarr'

    @pytest.mark.parametrize(
        "case_name,iterator,iterator_options",
        [
            ("unwrapped", lambda x: x, dict()),
            ("generic", SliceableDataChunkIterator, dict()),
            ("classic", DataChunkIterator, dict(iter_axis=1, buffer_size=30_000 * 5)),
            # Need to hardcode buffer size in classic case or else it takes forever...
        ],
    )
    @pytest.mark.parametrize("backend", ["hdf5", "zarr"])
    def test_simple_time_series(
        tmpdir: Path,
        integer_array: np.ndarray,
        case_name: str,
        iterator: callable,
        iterator_options: dict,
        backend: Literal["hdf5", "zarr"],
    ):
        data = iterator(integer_array, **iterator_options)

        nwbfile = mock_NWBFile()
        time_series = mock_TimeSeries(name="TestTimeSeries", data=data)
        nwbfile.add_acquisition(time_series)

        backend_configuration = get_default_backend_configuration(nwbfile=nwbfile, backend=backend)
        dataset_configuration = backend_configuration.dataset_configurations["acquisition/TestTimeSeries/data"]
        configure_backend(nwbfile=nwbfile, backend_configuration=backend_configuration)

        nwbfile_path = str(tmpdir / f"test_configure_defaults_{case_name}_time_series.nwb.{backend}")
        with BACKEND_NWB_IO[backend](path=nwbfile_path, mode="w") as io:
            io.write(nwbfile)

        with BACKEND_NWB_IO[backend](path=nwbfile_path, mode="r") as io:
            written_nwbfile = io.read()

Traceback

with BACKEND_NWB_IO[backend](path=nwbfile_path, mode="r") as io:
    > written_nwbfile = io.read()

tests/test_minimal/test_tools/test_backend_and_dataset_configuration/test_helpers/test_configure_backend_defaults.py:74: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/hdmf/backends/io.py:56: in read
    f_builder = self.read_builder()
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/hdmf_zarr/backend.py:1323: in read_builder
    f_builder = self.__read_group(self.__file, ROOT_NAME)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/hdmf_zarr/backend.py:1388: in __read_group
    sub_builder = self.__read_group(sub_group, sub_name)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/hdmf_zarr/backend.py:1388: in __read_group
    sub_builder = self.__read_group(sub_group, sub_name)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/hdmf_zarr/backend.py:1393: in __read_group
    sub_builder = self.__read_dataset(sub_array, sub_name)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/hdmf_zarr/backend.py:1454: in __read_dataset
    data = zarr_obj[0]
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/zarr/core.py:800: in __getitem__
    result = self.get_basic_selection(pure_selection, fields=fields)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/zarr/core.py:926: in get_basic_selection
    return self._get_basic_selection_nd(selection=selection, out=out, fields=fields)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/zarr/core.py:968: in _get_basic_selection_nd
    return self._get_selection(indexer=indexer, out=out, fields=fields)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/zarr/core.py:1343: in _get_selection
    self._chunk_getitems(
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/zarr/core.py:2181: in _chunk_getitems
    self._process_chunk(
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/zarr/core.py:2049: in _process_chunk
    self._compressor.decode(cdata, dest)
numcodecs/blosc.pyx:564: in numcodecs.blosc.Blosc.decode
    ???
numcodecs/blosc.pyx:365: in numcodecs.blosc.decompress
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   ValueError: buffer source array is read-only

Operating System

Windows

Python Executable

Conda

Python Version

3.8

Package Versions

No response

Code of Conduct

[X] I agree to follow this project's Code of Conduct
[X] Have you checked the Contributing document?
[X] Have you ensured this bug was not already reported?

oruebel commented 5 months ago

From the traceback it looks like this fails when it tries to read the first element of the data set data = zarr_obj[0] and it looks like the error occurs in numcodec rather than Zarr. What confuses me is the error ValueError: buffer source array is read-only, which seems to indicate that blosc wants write access even when reading from file. I'm wondering whether this may be an issue in Zarr or numcodec instead of hdmf_zarr

A couple of things to try:

Change the mode during reading to a instead of r to see if that fixes the issue, i.e., change this line to with BACKEND_NWB_IO[backend](path=nwbfile_path, mode="a") as io:. If that works, then I think this may be an issue in Zarr.
Can you also try reading the file with just Zarr, i.e,. open the file and try to read data = zarr_obj[0] from the dataset that causes the issue? If that works, another thing to try is to open with consolidated metadata (because hdmf_zarr uses consolidated metadata by default)

oruebel commented 5 months ago

@mavaylon1 can you take this from here.

mavaylon1 commented 5 months ago

@oruebel I can, but I probably won't take a look till earliest end of next week. Does that fit your timeline?

mavaylon1 commented 4 months ago

Possibly related: #195

mavaylon1 commented 4 months ago

This seems to have been resolved by my fix and release in #195

hdmf-dev / hdmf-zarr