hdmf-dev / hdmf-zarr

Zarr I/O backend for HDMF
https://hdmf-zarr.readthedocs.io/
Other
7 stars 6 forks source link

[Bug]: Zarr 2.18.0 with Blosc #192

Closed CodyCBakerPhD closed 4 months ago

CodyCBakerPhD commented 5 months ago

What happened?

Just encountered test failure in NeuroConv due to latest Zarr release on May 7

Full log: https://github.com/catalystneuro/neuroconv/actions/runs/9005172770/job/24739878400

Including test case below, show have most of what you need to reproduce

Wanted to check if this has anything to do with how the file is being read on hdmf-zarr side, or otherwise just let y'all be aware of the issue

Steps to Reproduce

tmpdir = local('/tmp/pytest-of-runner/pytest-0/popen-gw1/test_simple_time_series_zarr_g0')
integer_array = array([[   606,  22977,  27598, ...,  21453,  14831,  29962],
       [-26530,  -9155,  -6666, ...,  18490,  -6943,   1...5954, -21319, ...,  -8983, -30074, -24446],
       [-30841, -12815,  28599, ...,  24069, -15762,  -3284]], dtype=int16)
case_name = 'generic'
iterator = <class 'neuroconv.tools.hdmf.SliceableDataChunkIterator'>
iterator_options = {}, backend = 'zarr'

    @pytest.mark.parametrize(
        "case_name,iterator,iterator_options",
        [
            ("unwrapped", lambda x: x, dict()),
            ("generic", SliceableDataChunkIterator, dict()),
            ("classic", DataChunkIterator, dict(iter_axis=1, buffer_size=30_000 * 5)),
            # Need to hardcode buffer size in classic case or else it takes forever...
        ],
    )
    @pytest.mark.parametrize("backend", ["hdf5", "zarr"])
    def test_simple_time_series(
        tmpdir: Path,
        integer_array: np.ndarray,
        case_name: str,
        iterator: callable,
        iterator_options: dict,
        backend: Literal["hdf5", "zarr"],
    ):
        data = iterator(integer_array, **iterator_options)

        nwbfile = mock_NWBFile()
        time_series = mock_TimeSeries(name="TestTimeSeries", data=data)
        nwbfile.add_acquisition(time_series)

        backend_configuration = get_default_backend_configuration(nwbfile=nwbfile, backend=backend)
        dataset_configuration = backend_configuration.dataset_configurations["acquisition/TestTimeSeries/data"]
        configure_backend(nwbfile=nwbfile, backend_configuration=backend_configuration)

        nwbfile_path = str(tmpdir / f"test_configure_defaults_{case_name}_time_series.nwb.{backend}")
        with BACKEND_NWB_IO[backend](path=nwbfile_path, mode="w") as io:
            io.write(nwbfile)

        with BACKEND_NWB_IO[backend](path=nwbfile_path, mode="r") as io:
            written_nwbfile = io.read()

Traceback

with BACKEND_NWB_IO[backend](path=nwbfile_path, mode="r") as io:
    > written_nwbfile = io.read()

tests/test_minimal/test_tools/test_backend_and_dataset_configuration/test_helpers/test_configure_backend_defaults.py:74: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/hdmf/backends/io.py:56: in read
    f_builder = self.read_builder()
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/hdmf/utils.py:668: in func_call
    return func(args[0], **pargs)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/hdmf_zarr/backend.py:1323: in read_builder
    f_builder = self.__read_group(self.__file, ROOT_NAME)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/hdmf_zarr/backend.py:1388: in __read_group
    sub_builder = self.__read_group(sub_group, sub_name)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/hdmf_zarr/backend.py:1388: in __read_group
    sub_builder = self.__read_group(sub_group, sub_name)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/hdmf_zarr/backend.py:1393: in __read_group
    sub_builder = self.__read_dataset(sub_array, sub_name)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/hdmf_zarr/backend.py:1454: in __read_dataset
    data = zarr_obj[0]
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/zarr/core.py:800: in __getitem__
    result = self.get_basic_selection(pure_selection, fields=fields)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/zarr/core.py:926: in get_basic_selection
    return self._get_basic_selection_nd(selection=selection, out=out, fields=fields)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/zarr/core.py:968: in _get_basic_selection_nd
    return self._get_selection(indexer=indexer, out=out, fields=fields)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/zarr/core.py:1343: in _get_selection
    self._chunk_getitems(
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/zarr/core.py:2181: in _chunk_getitems
    self._process_chunk(
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/zarr/core.py:2049: in _process_chunk
    self._compressor.decode(cdata, dest)
numcodecs/blosc.pyx:564: in numcodecs.blosc.Blosc.decode
    ???
numcodecs/blosc.pyx:365: in numcodecs.blosc.decompress
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   ValueError: buffer source array is read-only

Operating System

Windows

Python Executable

Conda

Python Version

3.8

Package Versions

No response

Code of Conduct

oruebel commented 5 months ago

From the traceback it looks like this fails when it tries to read the first element of the data set data = zarr_obj[0] and it looks like the error occurs in numcodec rather than Zarr. What confuses me is the error ValueError: buffer source array is read-only, which seems to indicate that blosc wants write access even when reading from file. I'm wondering whether this may be an issue in Zarr or numcodec instead of hdmf_zarr

A couple of things to try:

oruebel commented 5 months ago

@mavaylon1 can you take this from here.

mavaylon1 commented 5 months ago

@oruebel I can, but I probably won't take a look till earliest end of next week. Does that fit your timeline?

mavaylon1 commented 4 months ago

Possibly related: #195

mavaylon1 commented 4 months ago

This seems to have been resolved by my fix and release in #195