zarr-developers / zarr-python

An implementation of chunked, compressed, N-dimensional arrays for Python.
https://zarr.readthedocs.io
MIT License
1.53k stars 283 forks source link

Clarify semantics for reading from a closed store with `mode='w'`. #2338

Open TomAugspurger opened 1 month ago

TomAugspurger commented 1 month ago

Zarr version

v3

Numcodecs version

na

Python Version

na

Operating System

na

Installation

na

Description

v3's behavior differs when you

  1. create an array inside a group
  2. close the store the group was created in
  3. try to read the array

At least for MemoryStore. The group.__getitem__ eventually tries to do a MemoryStore.get. The store is closed (by the user) and so it attempts to reopen. For MemoryStore, that doesn't really matter but since we have mode="w" the dict is cleared and the lookup failed.

Steps to reproduce

In [1]: import zarr

In [2]: store = zarr.storage.MemoryStore({}, mode="w")

In [3]: group = zarr.group(store=store)

In [4]: group.create(name="a", shape=(5,))
Out[4]: <Array memory://4441054784/a shape=(5,) dtype=float64>

In [5]: store.close()

In [6]: group["a"]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[6], line 1
----> 1 group["a"]

File ~/gh/zarr-developers/zarr-python/src/zarr/core/group.py:1327, in Group.__getitem__(self, path)
   1326 def __getitem__(self, path: str) -> Array | Group:
-> 1327     obj = self._sync(self._async_group.getitem(path))
   1328     if isinstance(obj, AsyncArray):
   1329         return Array(obj)

File ~/gh/zarr-developers/zarr-python/src/zarr/core/sync.py:185, in SyncMixin._sync(self, coroutine)
    182 def _sync(self, coroutine: Coroutine[Any, Any, T]) -> T:
    183     # TODO: refactor this to to take *args and **kwargs and pass those to the method
    184     # this should allow us to better type the sync wrapper
--> 185     return sync(
    186         coroutine,
    187         timeout=config.get("async.timeout"),
    188     )

File ~/gh/zarr-developers/zarr-python/src/zarr/core/sync.py:141, in sync(coro, loop, timeout)
    138 return_result = next(iter(finished)).result()
    140 if isinstance(return_result, BaseException):
--> 141     raise return_result
    142 else:
    143     return return_result

File ~/gh/zarr-developers/zarr-python/src/zarr/core/sync.py:100, in _runner(coro)
     95 """
     96 Await a coroutine and return the result of running it. If awaiting the coroutine raises an
     97 exception, the exception will be returned.
     98 """
     99 try:
--> 100     return await coro
    101 except Exception as ex:
    102     return ex

File ~/gh/zarr-developers/zarr-python/src/zarr/core/group.py:617, in AsyncGroup.getitem(self, key)
    615 zarr_json_bytes = await (store_path / ZARR_JSON).get()
    616 if zarr_json_bytes is None:
--> 617     raise KeyError(key)
    618 else:
    619     zarr_json = json.loads(zarr_json_bytes.to_bytes())

KeyError: 'a'

In zarr-python 2.x, no error is raised:


In [1]: import zarr

In [2]: store = zarr.storage.KVStore({})

In [3]: group = zarr.group(store=store)

In [4]: group.create(name="a", shape=(5,))
Out[4]: <zarr.core.Array '/a' (5,) float64>

In [6]: store.close()

In [7]: group["a"]
Out[7]: <zarr.core.Array '/a' (5,) float64>

Additional output

No response

d-v-b commented 1 month ago

i think this is the same phenomenon I noticed here: https://github.com/zarr-developers/zarr-python/pull/2067

TomAugspurger commented 1 month ago

Yep, looks the same. I added a "Closes" to that PR description pointing here.