xarray-contrib / datatree

WIP implementation of a tree-like hierarchical data structure for xarray.
https://xarray-datatree.readthedocs.io
Apache License 2.0
161 stars 43 forks source link

Opening a datatree from S3 bucket #322

Open vlevasseur073 opened 3 months ago

vlevasseur073 commented 3 months ago

Dears,

it seems that the current version of datatree can't handle stores from cloud storage (tests made with S3 only). For instance, trying to open a datatree following the same syntax as xarray.open_dataset (using fsspec chain URLs):

store="zip::s3://bucket/path/product.zarr.zip"
dt = datatree.open_datatree(store,engine="zarr",backend_kwargs={"storage_options": {"s3":secrets["s3input"]}})

where secrets["s3input"] is a dict containing the AWS secret keys and endpoint URLs.

fails with

ClientError                               Traceback (most recent call last)
File [/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/s3fs/core.py:113](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/s3fs/core.py:113), in _error_wrapper(func, args, kwargs, retries)
    [112](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/s3fs/core.py:112) try:
--> [113](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/s3fs/core.py:113)     return await func(*args, **kwargs)
    [114](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/s3fs/core.py:114) except S3_RETRYABLE_ERRORS as e:

File [/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiobotocore/client.py:408](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiobotocore/client.py:408), in AioBaseClient._make_api_call(self, operation_name, api_params)
    [407](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiobotocore/client.py:407)     error_class = self.exceptions.from_code(error_code)
--> [408](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiobotocore/client.py:408)     raise error_class(parsed_response, operation_name)
    [409](https://file+.vscode-resource.vscode-cdn.net/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/aiobotocore/client.py:409) else:

ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden

Indeed in _open_datatree_zarr from datatree/io.py, the kwargs are not given to the zarr.open_group function, so that specifically in this case the storage_options are ignored. As a workaround in my specific case, replacing in datatree/io.py l.87 (v0.0.14)

zds = zarr.open_group(store, mode="r")

by

storage_options = kwargs["backend_kwargs"]
zds = zarr.open_group(store, mode="r",**storage_options)

works just fine.

TomNicholas commented 2 months ago

Hi @vlevasseur073, sorry for the slow reply here.

We would welcome a PR to fix this!