pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.49k stars 1.04k forks source link

Opening a datatree from S3 bucket with zarr store #9197

Open vlevasseur073 opened 2 days ago

vlevasseur073 commented 2 days ago

What happened?

Trying to open a datatree using the Zarr backend from a zarr file stored in a private S3 bucket leads to the following error:

GroupNotFoundError: group not found at path ''

This issue was already in the xarray-contrib/datatree, see https://github.com/xarray-contrib/datatree/issues/322 The fix could be more or less the same, but at that time I did not take time to propose a PR.

What did you expect to happen?

The open_datatree function from zarr.py has a storage_optionsargument. Yet this argument is not passed to the ZarrStore.open_store.

Minimal Complete Verifiable Example

import xarray.backends.api as xr_api
storage_options = { 
"s3": {
        "key": [access-key]
        "secret": [secret-key],
        "endpoint_url": [endpoint-url]
    }
}
dt=xr_api.open_datatree("s3://path/to/product",engine="zarr",storage_options=storage_options)
dt

MVCE confirmation

Relevant log output

No response

Anything else we need to know?

A possible fix could be, in xarray.backends.zarr.open_datatree:

filename_or_obj = _normalize_path(filename_or_obj)
        if group:
            parent = NodePath("/") / NodePath(group)
            stores = ZarrStore.open_store(filename_or_obj, group=parent,storage_options=storage_options)
            if not stores:
                ds = open_dataset(
                    filename_or_obj, group=parent, engine="zarr", **kwargs
                )
                return DataTree.from_dict({str(parent): ds})
        else:
            parent = NodePath("/")
            stores = ZarrStore.open_store(filename_or_obj, group=parent,storage_options=storage_options)
        if storage_options:
            kwargs["backend_kwargs"] = {"storage_options": storage_options}
        ds = open_dataset(filename_or_obj, group=parent, engine="zarr", **kwargs)

As a summary:

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0] python-bits: 64 OS: Linux OS-release: 5.15.0-113-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: fr_FR.UTF-8 LOCALE: ('fr_FR', 'UTF-8') libhdf5: 1.14.2 libnetcdf: 4.9.3-development xarray: 2024.6.0 pandas: 2.2.2 numpy: 2.0.0 scipy: 1.13.1 netCDF4: 1.7.1 pydap: None h5netcdf: 1.3.0 h5py: 3.11.0 zarr: 2.18.2 cftime: 1.6.4 nc_time_axis: None iris: None bottleneck: None dask: 2024.6.2 distributed: None matplotlib: 3.9.0 cartopy: None seaborn: None numbagg: None fsspec: 2024.6.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 69.5.1 pip: 24.0 conda: None pytest: None mypy: None IPython: 8.26.0 sphinx: None
welcome[bot] commented 2 days ago

Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!