zarr-developers / zarr-python

An implementation of chunked, compressed, N-dimensional arrays for Python.
https://zarr.readthedocs.io
MIT License
1.53k stars 286 forks source link

Way to ask if chunk exists? #2507

Open TomNicholas opened 3 days ago

TomNicholas commented 3 days ago

Is there a way to ask zarr if a key is backed by a chunk (as opposed to defaulting to the fill_value)?

The motivation is trying to create virtual references for an existing zarr store, but not knowing which chunks of the chunk grid actually exist - see https://github.com/zarr-developers/VirtualiZarr/pull/271#discussion_r1844486393

cc @norlandrhagen

TomAugspurger commented 2 days ago

Is the key here a positional indexer into the array, or a chunk indexer, or something else? It's possible that a combination of Array.metadata.encode_chunk_key and store.exists will do what you want:

In [20]: arr = zarr.create(path="a", shape=(3, 4, 5), chunks=(2, 2, 2))

In [21]: arr.metadata.encode_chunk_key((0, 0, 0))
Out[21]: 'c/0/0/0'

In [22]: arr.metadata.encode_chunk_key((1, 2, 3))
Out[22]: 'c/1/2/3'
TomNicholas commented 2 days ago

It's a chunk indexer. We have a store, and want to calculate the byte offsets and ranges for every chunk in the store. Assuming no sharding the offset is always 0, we get the length of each chunk using the new .getsize (because with compression they will could different lengths), but it would also be nice to know which chunks don't actually exist in the store so we don't bother trying to get their sizes or writing those into the generated chunk manifest.

It does sound like store.exists is what we need though, thanks!

d-v-b commented 1 day ago

see also Array._iter_chunk_keys