xarray-contrib / datatree

WIP implementation of a tree-like hierarchical data structure for xarray.
https://xarray-datatree.readthedocs.io
Apache License 2.0
169 stars 44 forks source link

Should the parent of the root node be None or itself? #288

Closed TomNicholas closed 7 months ago

TomNicholas commented 10 months ago

Currently the .parent attribute of a node can point to another DataTree object, or just None if that node is the root node.

@etienneschalk pointed out in https://github.com/xarray-contrib/datatree/pull/286#discussion_r1413217278 that in pathlib (and filesystems in general) instead the parent of the root node is just the root node.

I can't decide if this is something worth changing. Either could be made to work with DataTree I think, but changing the behaviour of .parent would be a breaking change.

etienneschalk commented 10 months ago

A root being defined as a node having its parent to None is at least in my opinion more intuitive than a self-referencing node.

Code-wise, I assume the definition of a node being a root would change from node.parent is None to node.parent is node. For potential bugs, having the first means potential access errors on None, while having the second means potential infinite recursion.

However, root nodes with a None parents mean, they implicitly all belong to the same tree. So the checks like "do the nodes belong in different trees" might be less verbose with self-referencing roots. Indeed, node.root.parent is not other.root.parent, can be used to determine if node and other belong to the same tree, but cannot be used when the roots share the None parent.

marcel-goldschen-ohm commented 7 months ago

I completely agree with @etienneschalk point that a None parent of the root is far more intuitive than a self-referencing parent, which will lead to nightmares.

Regarding checks for belonging to the same tree, why not just node.root is not other.root? I don't see why the None parent is a problem here.

TomNicholas commented 7 months ago

Regarding checks for belonging to the same tree, why not just node.root is not other.root? I don't see why the None parent is a problem here.

That's literally exactly what we currently do

https://github.com/xarray-contrib/datatree/blob/0afaa6cc1d6800987d8b9c37a604dc0a8c68aeaa/datatree/treenode.py#L573

I completely agree with @etienneschalk point that a None parent of the root is far more intuitive than a self-referencing parent, which will lead to nightmares.

The consensus here seems to be that a self-referencing root is a bad idea, so I will close this issue.