pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.6k stars 1.08k forks source link

Issue with xarray and zarr when dimensions start with "meta" #6853

Closed zndr27 closed 11 months ago

zndr27 commented 2 years ago

What happened?

Currently I'm working with xarray for MRSI analysis. I have been using xarray datasets with one the dimensions labeled as "metabolite." Previously I have been able to save and load the data to zarr with no issues using xr.Dataset.to_zarr and xr.open_zarr.

Currently I'm getting an issue where I get an error raised by zarr that complains about this dimension starting with "meta." I think this may be due to a new version of zarr. I have copied the error below.

When I changed the dimension name (i.e. the zarr subfolder and .zmetadata file) from "metabolite" to something that doesn't start with "meta" then I can load the data properly.

Someone may need to modify the xr.Dataset.to_zarr and xr.open_zarr functions in case an xarray user decides to create a dimension that starts with "meta" and wants to save their dataset to zarr.

Please let me know if you need to send any additional info.

What did you expect to happen?

I expected to load an xarray dataset saved by xr.Dataset.to_zarr() using xr.open_zarr(). However, zarr through an error because one of the dimensions (metabolite) started with "meta".

Minimal Complete Verifiable Example

No response

MVCE confirmation

Relevant log output

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Input In [5], in <cell line: 1>()
----> 1 xr.open_zarr("/home/nonroot/data1/Data/MIDAS/U01_Midas/QINU01EM004/nnfit/09_10_2014.data3D.zarr/")

File ~/.pyenv/versions/3.10.5/lib/python3.10/site-packages/xarray/backends/zarr.py:789, in open_zarr(store, group, synchronizer, chunks, decode_cf, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, consolidated, overwrite_encoded_chunks, chunk_store, storage_options, decode_timedelta, use_cftime, **kwargs)
    776     raise TypeError(
    777         "open_zarr() got unexpected keyword arguments " + ",".join(kwargs.keys())
    778     )
    780 backend_kwargs = {
    781     "synchronizer": synchronizer,
    782     "consolidated": consolidated,
   (...)
    786     "stacklevel": 4,
    787 }
--> 789 ds = open_dataset(
    790     filename_or_obj=store,
    791     group=group,
    792     decode_cf=decode_cf,
    793     mask_and_scale=mask_and_scale,
    794     decode_times=decode_times,
    795     concat_characters=concat_characters,
    796     decode_coords=decode_coords,
    797     engine="zarr",
    798     chunks=chunks,
    799     drop_variables=drop_variables,
    800     backend_kwargs=backend_kwargs,
    801     decode_timedelta=decode_timedelta,
    802     use_cftime=use_cftime,
    803 )
    804 return ds

File ~/.pyenv/versions/3.10.5/lib/python3.10/site-packages/xarray/backends/api.py:531, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, backend_kwargs, **kwargs)
    519 decoders = _resolve_decoders_kwargs(
    520     decode_cf,
    521     open_backend_dataset_parameters=backend.open_dataset_parameters,
   (...)
    527     decode_coords=decode_coords,
    528 )
    530 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 531 backend_ds = backend.open_dataset(
    532     filename_or_obj,
    533     drop_variables=drop_variables,
    534     **decoders,
    535     **kwargs,
    536 )
    537 ds = _dataset_from_backend_dataset(
    538     backend_ds,
    539     filename_or_obj,
   (...)
    547     **kwargs,
    548 )
    549 return ds

File ~/.pyenv/versions/3.10.5/lib/python3.10/site-packages/xarray/backends/zarr.py:851, in ZarrBackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, synchronizer, consolidated, chunk_store, storage_options, stacklevel)
    849 store_entrypoint = StoreBackendEntrypoint()
    850 with close_on_error(store):
--> 851     ds = store_entrypoint.open_dataset(
    852         store,
    853         mask_and_scale=mask_and_scale,
    854         decode_times=decode_times,
    855         concat_characters=concat_characters,
    856         decode_coords=decode_coords,
    857         drop_variables=drop_variables,
    858         use_cftime=use_cftime,
    859         decode_timedelta=decode_timedelta,
    860     )
    861 return ds

File ~/.pyenv/versions/3.10.5/lib/python3.10/site-packages/xarray/backends/store.py:26, in StoreBackendEntrypoint.open_dataset(self, store, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta)
     14 def open_dataset(
     15     self,
     16     store,
   (...)
     24     decode_timedelta=None,
     25 ):
---> 26     vars, attrs = store.load()
     27     encoding = store.get_encoding()
     29     vars, attrs, coord_names = conventions.decode_cf_variables(
     30         vars,
     31         attrs,
   (...)
     38         decode_timedelta=decode_timedelta,
     39     )

File ~/.pyenv/versions/3.10.5/lib/python3.10/site-packages/xarray/backends/common.py:125, in AbstractDataStore.load(self)
    103 def load(self):
    104     """
    105     This loads the variables and attributes simultaneously.
    106     A centralized loading function makes it easier to create
   (...)
    122     are requested, so care should be taken to make sure its fast.
    123     """
    124     variables = FrozenDict(
--> 125         (_decode_variable_name(k), v) for k, v in self.get_variables().items()
    126     )
    127     attributes = FrozenDict(self.get_attrs())
    128     return variables, attributes

File ~/.pyenv/versions/3.10.5/lib/python3.10/site-packages/xarray/backends/zarr.py:461, in ZarrStore.get_variables(self)
    460 def get_variables(self):
--> 461     return FrozenDict(
    462         (k, self.open_store_variable(k, v)) for k, v in self.zarr_group.arrays()
    463     )

File ~/.pyenv/versions/3.10.5/lib/python3.10/site-packages/xarray/core/utils.py:474, in FrozenDict(*args, **kwargs)
    473 def FrozenDict(*args, **kwargs) -> Frozen:
--> 474     return Frozen(dict(*args, **kwargs))

File ~/.pyenv/versions/3.10.5/lib/python3.10/site-packages/xarray/backends/zarr.py:461, in <genexpr>(.0)
    460 def get_variables(self):
--> 461     return FrozenDict(
    462         (k, self.open_store_variable(k, v)) for k, v in self.zarr_group.arrays()
    463     )

File ~/.pyenv/versions/3.10.5/lib/python3.10/site-packages/zarr/hierarchy.py:603, in Group._array_iter(self, keys_only, method, recurse)
    601 for key in sorted(listdir(self._store, self._path)):
    602     path = self._key_prefix + key
--> 603     assert not path.startswith("meta")
    604     if contains_array(self._store, path):
    605         _key = key.rstrip("/")

AssertionError:

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.5 (main, Jul 31 2022, 18:17:20) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.15.0-43-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: None xarray: 2022.6.0 pandas: 1.4.3 numpy: 1.23.1 scipy: 1.9.0 netCDF4: None pydap: None h5netcdf: None h5py: 3.7.0 Nio: None zarr: 2.12.0 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.12.0 distributed: 2021.12.0 matplotlib: 3.5.2 cartopy: None seaborn: 0.11.2 numbagg: None fsspec: 2022.7.1 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 58.1.0 pip: 22.2.1 conda: None pytest: 6.2.5 IPython: 8.4.0 sphinx: None
TomNicholas commented 2 years ago

Hi @zndr27, thanks for raising this issue.

It should be possible to create a zarr store directly using zarr-python, make sure it has a dimension name containing "meta", save it, and open it also using zarr directly. I suspect that if you try that you might find that this is a bug in zarr rather than in xarray.

It would be really helpful if you could try doing that, and either raise an issue on the zarr issue tracker if the bug still exists, or comment again here to say that you still think the problem is with xarray's code.