Open eschalkargans opened 7 months ago
This is definitely an xarray-level issue, not a datatree-specific issue. All datatree does is open each group of a zarr store using xarray.open_dataset
and put them in a tree.
However, developers may find themselves at one point or another with plain Zarr files that are incompatible with the current xarray implementation. So, I think there should be a way to open these Zarr files with no dimension-names.
I have some thoughts about this but I think you should re-raise it on the xarray issue tracker instead!
Hello,
Bug Description
I am currently experimenting with datatree (
xarray-datatree==0.0.13
) to open a Zarr folder.I assume that datatree should be able to open all of the Zarr files. However, in the current situation, it seems that datatree can only open zarr files that were generated with xarray. Indeed, when the
_ARRAY_DIMENSIONS
attribute is missing from the metadata contained in the.zmetadata
file present at the root of the Zarr, datatree is unable to load the Zarr file. AKeyError: '_ARRAY_DIMENSIONS'
is thrown.Reproduce the Bug
You can find in the following gist a small python script reproducing the issue:
https://gist.github.com/eschalkargans/6c8708370ad6b7b58eebe95aa95084ab
Here is the sequence:
(label, z)
dimensional DataArray namedmy_xda
._ARRAY_DIMENSIONS
from all of the variables.zattrs
:z
,label
,my_xda
, and try to reopen the Zarr. It is in all cases a success. :heavy_check_mark:_ARRAY_DIMENSIONS
key-value pair from one of the variables in the.zmetadata
file present at the root of the zarr, results in an exception when reading. The error message is explicit:KeyError: '_ARRAY_DIMENSIONS'
:x:Discussion
More information about
_ARRAY_DIMENSIONS
: Zarr Encoding SpecificationThe documentation explicitly states that Xarray cannot read arbitrary array data. So, this issue is more a feature request than a bug description. It is currently expected that such files are not readable.
However, developers may find themselves at one point or another with plain Zarr files that are incompatible with the current xarray implementation. So, I think there should be a way to open these Zarr files with no dimension-names. Maybe the user can provide themselves a mapping for missing dimensions, eg
_only missing attributes, merging the read .zmetadata with the user-provided
_array_dimensions
_or even proposing a full mapping from path of the variable into the Zarr hierarchy to their list of dimension names:
Or, maybe do you wait for an update of the Zarr specification in the future that would fully incorporate named dimensions? In that case, what strategy would you recommend for users of
datatree
to fix their Zarr? Updating directly the.zmetadata
?Thanks!