Open darsnack opened 2 months ago
This error persists on the v3
branch as well
thanks for the bug report @darsnack, I'm planning on making some improvements to the sharding test suite this week so I will hopefully get some time to replicate (and maybe fix) this bug
After some debugging, I think I've narrowed down the cause of the issue. Here's a summary:
shard_index.get_chunk_slice
here which returns the start and end index (in bytes) of the inner chunk._get
since we are dealing with a local store on disk for each shard.byte_range
(i.e. the chunk slice from Step 1) as a start index and total length to read. This is the error, since we specified a start and end index instead.From this, we get the behavior described in the bug report.
Currently, I could try modifying get_chunk_slice
from Step 1 or do the even more minimal change of computing the total length from the output of get_chunk_slice
. I am not sure what the downstream effects of the former will be, since I am not familiar with this codebase. I'm happy to put up a PR with the bug fix, but I'll need some guidance on what fix the maintainers prefer.
Zarr version
v3.0.0a0
Numcodecs version
v0.12.1
Python Version
3.12
Operating System
Linux
Installation
Using Poetry
Description
I have an array stored using Zarr v3 in a sharded format where the inner chunk size is 1. Reading past the first chunk results in an error show below in the MWE. If the chunk size is > 1 (e.g.
k
), then the no errors occur for indices 0 throughk - 1
, but the same error occurs when accessing indexk
onwards.Steps to reproduce
First, create a sharded store:
Now, attempt to open the store and read a single chunk at a time:
If we try to access
arr[2]
then the error will try to reshape an array of size 3000. It seems that doingarr[i]
reads chunks from 0 throughi
(inclusive) instead of a single chunk.Additional output
No response