zarr-developers / zarr-python

An implementation of chunked, compressed, N-dimensional arrays for Python.
https://zarr.readthedocs.io
MIT License
1.45k stars 273 forks source link

Add array storage helpers #2065

Open d-v-b opened 1 month ago

d-v-b commented 1 month ago

This PR adds nchunks, nbytes, and nchunks_initialized functionality from 2.x.

closes #2027 depends on #2064

details

Adds the following to array.py:

All of the above _iter_chunk_* methods should be considered private and provisional. I added them because their functionality is valuable, but eventually I think we will have a better array API that renders these methods obsolete. If we think these are cluttering the array API, I'd be happy splitting them off into stand-alone functions.

TODO:

d-v-b commented 1 month ago

@tomwhite let me know if this looks workable for you

tomwhite commented 1 month ago

Thanks @d-v-b this looks great!

I wondered why you deprecated nchunks (and nchunks_initialized) though? The number of chunks in an array is something that should always be well-defined. Also, deprecating something usually means there's a better alternative, but I don't see one here.

d-v-b commented 1 month ago

I wondered why you deprecated nchunks (and nchunks_initialized) though? The number of chunks in an array is something that should always be well-defined. Also, deprecating something usually means there's a better alternative, but I don't see one here.

my thinking for this is twofold:

does this check out? I'm sorry if the warnings are inconvenient, but I really would like to find a proper expression of v3 semantics on the Array class and I worry that a blanket policy of forward-propagating v2-isms could be a hindrance to that effort.

d-v-b commented 1 month ago

The number of chunks in an array is something that should always be well-defined.

to expand on this: v3 introduces two kinds of chunks, read-chunks and write chunks. the number of read chunks may not equal the number of write chunks. so where we had 1 nchunks quantity in v2, v3 has two possible answers to nchunks. that's why it is not straightforward to commit to this aspect of the array API.