Open d-v-b opened 1 month ago
@tomwhite let me know if this looks workable for you
Thanks @d-v-b this looks great!
I wondered why you deprecated nchunks
(and nchunks_initialized
) though? The number of chunks in an array is something that should always be well-defined. Also, deprecating something usually means there's a better alternative, but I don't see one here.
I wondered why you deprecated
nchunks
(andnchunks_initialized
) though? The number of chunks in an array is something that should always be well-defined. Also, deprecating something usually means there's a better alternative, but I don't see one here.
my thinking for this is twofold:
chunks_initialized
function that gives the names of the initialized chunks, one can easily do len(chunks_initialized(...))
, i.e. we don't need a separate function to express the composition of chunks_initialized
and len
. similarly, nchunks
is merely len(array._iter_chunk_keys)
. If this logic is unsound, or these deprecation warnings are a problem, then we can remove them, but see the second point:does this check out? I'm sorry if the warnings are inconvenient, but I really would like to find a proper expression of v3 semantics on the Array
class and I worry that a blanket policy of forward-propagating v2-isms could be a hindrance to that effort.
The number of chunks in an array is something that should always be well-defined.
to expand on this: v3 introduces two kinds of chunks, read-chunks and write chunks. the number of read chunks may not equal the number of write chunks. so where we had 1 nchunks
quantity in v2, v3 has two possible answers to nchunks
. that's why it is not straightforward to commit to this aspect of the array API.
This PR adds
nchunks
,nbytes
, andnchunks_initialized
functionality from 2.x.closes #2027 depends on #2064
details
Adds the following to
array.py
:(AsyncArray / Array).nchunks
: deprecated, the total number of chunks in the array. exists for 2.xx compatibility.(AsyncArray / Array).cdata_shape
: deprecated, the shape of the chunk grid. exists for 2.xx compatibility.(AsyncArray / Array).nbytes
: the total number of bytes that the array can store(AsyncArray / Array)._iter_chunk_coords
: an iterator over tuples of ints which represent positions in the chunk grid(AsyncArray / Array)._iter_chunk_regions
: an iterator over slices which represent the contiguous array region spanned by each chunk(AsyncArray / Array)._iter_chunk_keys
: an iterator over strings which represent the paths in storage for all the chunkschunks_initialized(array)
: a function that takes an array and returns a tuple of the chunk keys for that array that exist in storage. this also has tests.nchunks_initialized(array)
: deprecated, a function that callslen(chunks_initialized(array))
. this exists for 2.xx compatibility.All of the above
_iter_chunk_*
methods should be considered private and provisional. I added them because their functionality is valuable, but eventually I think we will have a better array API that renders these methods obsolete. If we think these are cluttering the array API, I'd be happy splitting them off into stand-alone functions.iter_grid
toindexing.py
, this just provides lexicographic iteration over the elements of a bounded N-dimensional, positive grid (e.g., a grid of chunks).TODO: