xarray-contrib / datatree

WIP implementation of a tree-like hierarchical data structure for xarray.
https://xarray-datatree.readthedocs.io
Apache License 2.0
161 stars 43 forks source link

setting node name breaks tree linkage #309

Open marcel-goldschen-ohm opened 4 months ago

marcel-goldschen-ohm commented 4 months ago
# a simple tree
root = DataTree(name='root')
child = DataTree(name='child', parent=root)
grandchild = DataTree(name='grandchild', parent=child)

# changing the name of a child node does not correctly update the dict key in it's parent's children
child.name = 'childish'
print(root)  # this appears to be fine
print(list(root.children))  # however, the keys in root.children have not been updated
print(root['childish'])  # so this fails

Simple fix seems to be wherever the name property is being set it needs to also ensure that the keys in self.parent.children are updated as needed. Not sure if there is anywhere else that is storing these keys that also needs updating.

TomNicholas commented 4 months ago

Thank you for reporting this! The offending setter is here

https://github.com/xarray-contrib/datatree/blob/0afaa6cc1d6800987d8b9c37a604dc0a8c68aeaa/datatree/treenode.py#L597

This should update the key it is stored under in it's parent.

This should be a pretty simple fix if you (or perhaps @etienneschalk ?) are interested in going in? (If not then no worries)

etienneschalk commented 4 months ago

Hello @TomNicholas

In the context of merging datatree into xarray, should new developments continue to be made on this repo, or in the xarray repo? Or is there a code freeze until datatree can be worked with from inside the xarray repo? Or simply, new developments happening here will be integrated into xarray with some git wizardry?

Edit: the answer is in the README: https://github.com/xarray-contrib/datatree?tab=readme-ov-file#deprecation-notice

TomNicholas commented 4 months ago

In the context of merging datatree into xarray, should new developments continue to be made on this repo, or in the xarray repo? Or is there a code freeze until datatree can be worked with from inside the xarray repo? Or simply, new developments happening here will be integrated into xarray with some git wizardry?

I think we accept bug fixes here, but not new features. And whilst those bugfixes will be moved to xarray, you won't necessarily get full attribution for them (i.e. I'll probably do it the dumb copy-paste way instead of the git wizardry way).

TomNicholas commented 4 months ago

But we should fix the bug here! Because people will still be using this repository for a while yet (as this is what is uploaded to pypi/conda as xarray-datatree)

marcel-goldschen-ohm commented 4 months ago

I'm happy to tackle the fix, but will be traveling for a conference that runs through most of next week, so probably wouldn't get to it until after that. If someone else wants to fix it before then, by all means ;)

etienneschalk commented 4 months ago

What should be the expected behaviour when renaming a child node to None?

I had a look at how xarray behaves when renaming a DataArray inside of a Dataset. It seems that the renaming is just ignored when trying to change the name property of the DataArray directly:

import xarray as xr

https://docs.xarray.dev/en/stable/generated/xarray.DataArray.name.html

xds = xr.Dataset({"a": xr.DataArray([1])})
print(xds)
<xarray.Dataset>
Dimensions:  (dim_0: 1)
Dimensions without coordinates: dim_0
Data variables:
    a        (dim_0) int64 1
print(xds["a"])
<xarray.DataArray 'a' (dim_0: 1)>
array([1])
Dimensions without coordinates: dim_0
xds["a"].name = "toto"
print(xds["a"])
<xarray.DataArray 'a' (dim_0: 1)>
array([1])
Dimensions without coordinates: dim_0
xda = xds["a"]
xda.name = "toto"
print(xda)
<xarray.DataArray 'toto' (dim_0: 1)>
array([1])
Dimensions without coordinates: dim_0
print(xds)
<xarray.Dataset>
Dimensions:  (dim_0: 1)
Dimensions without coordinates: dim_0
Data variables:
    a        (dim_0) int64 1
marcel-goldschen-ohm commented 4 months ago

@etienneschalk, I find that to be very counterintuitive behavior. My naive expectation would be that the variable should be renamed as desired and the dataset updated to reflect that, and if there was any issue (like renaming to None or to the name of another variable) an exception would be raised. Of course, this is an xarray issue.