zarr-developers / zarr-python

An implementation of chunked, compressed, N-dimensional arrays for Python.
http://zarr.readthedocs.io/
MIT License
1.41k stars 267 forks source link

0 dim arrays: indexing #1980

Closed d-v-b closed 3 weeks ago

d-v-b commented 3 weeks ago

Support (basic) indexing of 0 dimensional arrays. I parameterized the old v2 test for this w.r.t. the out keyword and also the dtype of the array, marking the structured dtype case as an xfail. Builds on #1979.

@normanrz git blame tells me that you marked the 0d indexing tests as xfail, with the reasoning being that v3 does not support 0d arrays. I couldn't find anything in the v3 spec about 0d arrays (and I actually couldn't find anything in the v2 spec about it either!). Did I miss something in the spec? If not, we might want to make this explicit one way or another.

cc @TomWhite if you have the time to test this I would appreciate it. It's possible that there are other broken things that I haven't found via our test suite.

TODO:

tomwhite commented 3 weeks ago

Thanks for working on this @d-v-b!

I tried running the example from #1976, and it gets further, but doesn't seem to write the value:

>>> import zarr
>>> z = zarr.open(store='example-v3.zarr', mode='w', shape=())
>>> z[...] = 3
>>> z[...]
array(1.5e-323)

Did I miss something in the spec? If not, we might want to make this explicit one way or another.

I agree that the spec should say something about this case. 0-d arrays are still arrays, and should be supported in Zarr v3 IMO.

normanrz commented 3 weeks ago

The spec says that the shape needs to be an array of integers. I think this includes empty arrays (ie. 0d arrays). I think it is fine to add support. I would have expected more things to break.

How does chunk key encoding work for 0d arrays?

d-v-b commented 3 weeks ago

How does chunk key encoding work for 0d arrays?

@normanrz I was wondering the same thing. It turns out that I did miss something in the v3 spec: in the section describing the default chunk key encoding, it states:

Arrays may have 0 dimensions (when for example representing scalars), in which case the coordinate of a chunk is the empty tuple, and the chunk key will consist of the string c.

I wonder if this is a typo, and the intention was to use the v2 behavior with a single chunk at c/0 or c.0, depending on the separator metadata? Seems that there's already an open issue about this, which I had completely forgotten about.

normanrz commented 3 weeks ago

Arrays may have 0 dimensions (when for example representing scalars), in which case the coordinate of a chunk is the empty tuple, and the chunk key will consist of the string c.

I wonder if this is a typo, and the intention was to use the v2 behavior with a single chunk at c/0 or c.0, depending on the separator metadata? Seems that there's already an open issue about this, which I had completely forgotten about.

Reading it now, I think the spec pretty clear. For default, the chunk key will be c. v2 is more tricky intuitively. But the spec explicitly sets it to 0 for all 0d arrays. zarrita also implements it like that: https://github.com/scalableminds/zarrita/blob/async/zarrita/metadata.py#L138-L139

d-v-b commented 3 weeks ago

@tomwhite I'm going to merge this and open a subsequent PR that fixes the problem you observed.

d-v-b commented 3 weeks ago

I ended up needing to implement __array__ on the Array class, and I fixed a bug in Array._set_basic_selection wherein scalars were not being cast to the correct dtype.