xarray-contrib / datatree

WIP implementation of a tree-like hierarchical data structure for xarray.
https://xarray-datatree.readthedocs.io
Apache License 2.0
162 stars 43 forks source link

Assigning coords not consistent with xarray #242

Open blaylockbk opened 1 year ago

blaylockbk commented 1 year ago

I opened an HDF5 file with datatree and tried to add a new coordinate...

dt = datatree.open_datatree("/path/to/file.h5")

dt.GROUPNAME.coords["new"] = np.arange(10)

There are still no coordinates... image

And when I do dt.GROUPNAME.coords, I get an error

File [~/anaconda3/envs/flight/lib/python3.11/site-packages/xarray/core/formatting.py:335](https://vscode-remote+ssh-002dremote-002bnarwhal10-002enavydsrc-002ehpc-002emil.vscode-resource.vscode-cdn.net/p/home/blaylock/BB_python/Flight/~/anaconda3/envs/flight/lib/python3.11/site-packages/xarray/core/formatting.py:335), in _calculate_col_width(col_items)
    334 def _calculate_col_width(col_items):
--> 335     max_name_length = max(len(str(s)) for s in col_items) if col_items else 0
    336     col_width = max(max_name_length, 7) + 6
    337     return col_width

ValueError: max() arg is an empty sequence

But if I convert it to a Dataset, then assigning coords works as expected...

a = dt.GROUPNAME.to_dataset()
a.coords["new"] = np.arange(10)

image

And the value of a.coords is

Coordinates:
  * new      (new) int64 0 1 2 3 4 5 6 7 8 9



This is possibly some user error, but I was expecting to see the same behavior as a xarray.dataset.

TomNicholas commented 1 year ago

Hi Brian, thanks for reporting this. Unfortunately I think this simply isn't implemented yet.

If you're interested in the details: Currently DataTree.coords returns a DatasetCoordinates object here. The DatasetCoordinates class is defined in xarray here, and points back to a Dataset object, whose contents it is able to alter. However in datatree the dataset object pointed back to is created via DataTree.to_dataset(), which (purposefully) creates a new Dataset object that is unconnected to the DataTree. This issue is thus somewhat related to the discussion in https://github.com/xarray-contrib/datatree/issues/80.

At some point I will try to make this work (so thanks again for raising this issue so I don't forget!), but probably not particularly soon. In the meantime I suggest using .assign_coords/.set_coords or just manually altering the underlying dataset and re-assigning it via .ds.

blaylockbk commented 1 year ago

Thanks for the details. I'll use assign_coords in the meantime. Thanks again for datatree, it has made opening some hdf5 files much easier.