zarr-developers / zarr-python

An implementation of chunked, compressed, N-dimensional arrays for Python.
https://zarr.readthedocs.io
MIT License
1.45k stars 274 forks source link

Fix indexing bug when reading past the first chunk in a shard #2022

Open darsnack opened 2 months ago

darsnack commented 2 months ago

Closes #2018 by passing the correct format for the byte_range argument to a StorePath byte getter when accessing chunks within a shard. Previously, the return value of get_chunk_slice was directly passed in as the byte range. This is incorrect since it is formatted as (start index, end index) when it should be (start index, total read length).

TODO:

normanrz commented 2 months ago

Thanks! I don't quite understand why this hasn't shown up in the tests. Would you mind adding a test for this bug to https://github.com/zarr-developers/zarr-python/blob/v3/tests/v3/test_codecs/test_sharding.py?

darsnack commented 2 months ago

Taking a quick glance at the test suite indicates that we never attempt to read an intermediate slice outside of the first chunk in a shard. Any access beyond the first chunk is always a slice without an upper bound.

I could either modify the existing tests to include accesses the fit the pattern in the bug, or I can add a separate test case. What would you prefer?

normanrz commented 2 months ago

Thanks! I think adding parameterization for multiple access patterns is the way to go. That is what your commit attempts, right?

darsnack commented 2 months ago

Yeah that's what the commit does.

The test failure of a segfault is puzzling, but I believe it is genuine. I'm using my PR branch in another codebase, and I'm also seeing a segfault there occurring around the same point. (ignore this data point) Any idea what it could be?