Closed TomNicholas closed 3 weeks ago
Hi @TomNicholas , I would like to help with the code on this one. Do you think this might be a good first issue? Thanks!
Sure @etienneschalk! I think each of these bullet points is really it's own little issue, so feel free to open a PR for any one of them. (Maybe leave the tree-walking related ones for now though because I think those will be a little more complicated.)
Once we have completed some of these it would also be nice to add a little section in the documentation that points out this similarity explicitly to users. Also we can then reorganise the grouping of methods in api.rst
to have a section for Path-like methods
.
Pathlib
The following are some notes I taken while reading the pathlib documentation, thinking about equivalences in DataTree usage
This list only contains methods I did not classified as "Irrelevant". The "Irrelevant" tag is subjective to my understanding, I may have missed important methods
PurePath.parts
PurePath.root
PurePath.is_absolute()
DataTree.root
, same comment as parents
root
= parents[-1]
? No, currently the parents are rewinded until finding a parent with root is None
. Could it be simplified with parents[-1]
, if the path hierarchy is already known in advence?PurePath.parents
DataTree.parents
should use the paths obtained via its NodePath
identifier inside of the root's DataTree
to produce the list of parents' DataTree.root
attribute. Trees are aware of being a root or a subtree.PurePath.parent
parents
parent == parents[0]
?PurePath.name
PurePath.is_absolute()
PurePath.is_relative_to_other()
PurePath.joinpath
PurePath.match
DataTree.glob
by mapping it against all paths contained in the tree.PurePath.relative_to(_other_, _walk_up=False_)
PurePath.with_name(_name_)
PurePath.with_segments(*pathsegments)
Concrete Paths. Could be implemented by a companion DataTreePath class attached to a DataTree instance.
Path.glob()
PurePath.match
against all paths contained by the bound instance of DataTree
case_sensitivity
, since DataTree works with PurePosixPath, keep the default POSIX config: True
Path.is_dir()
DataTree
and Dataset
(directory-like) and DataArray
(file-like))is_group
could help, or is_aggregation
Dataset
may actually be closer to a leaf? At first glance, no, as it is non-atomatic. One could argue that a DataArray is non-atomic too (it carries dimension coordinates)Path.is_file()
path.is_dir()
is_dataarray
could help, or is_leaf
Path.is_symlink()
Path.iterdir()
ls
Path.walk
DataTree
Python 3.12
onlyPath.rglob("*")
when needing to iterate through a directory, so maybe walk
is dispensable.Path.mkdir
parents=True
, exist_ok
might be useful when working with groups.Path.rename
Path.replace
Path.rename
for DataTree, see https://bugs.python.org/issue27886 for discussion on that topic. replace
is more "expeditive" than rename
, as if a path already exists it will be surely replaced.Path.absolute()
Path.resolve()
absolute
, but also takes into accounts symlinks. To be considered if symbolic links are to be implemented in DataTreePath.rglob
Path.glob
, with the **
prefix. Depends on developer's tastePath.rmdir
relative_to
Path.samefile
Path.symlink_to
Path.touch
Path.unlink
DataTree
.NodePath
as the DataTree
's identifier, and use path.name in the reprPurePosixPath | str
for methods expecting a pathDataTree
Ideas of question for a FAQ.
A FAQ is a powerful documentation format, it is used for instance in the ruff
documentation: https://docs.astral.sh/ruff/faq/
The idea is to answer as quickly as possible as the seamingly mundane questions for someone knowing the tool, but not immediate at all for someone starting to use it
parent
has cardinality of 0..1 (0 if root, 1 if subtree)@property
def parent(self: DataTree) -> DataTree | None:
Closing in favour of https://github.com/pydata/xarray/issues/9448 upstream.
@eschalkargans suggested in #281 that the API of
DataTree
objects could more closely follow that ofpathlib.PurePath
objects. I think this aligning of APIs/nomenclature is a good idea. In general think it's conceptually useful to think of aDataTree
object as if it were an instance ofpathlib.PurePosixPath
(even though the actual implementation should not work like that).There are various methods we might want to add/change to make them more compatible:
Inspired by
pathlib.PurePath
:DataTree.match
should be renamed toDataTree.glob
DataTree.match
that returns a boolean likePurePath.match
doesDataTree.lineage
should be renamed to.parents
Add an(this is deprecated in.is_relative_to
methodpathlib
).joinpath
method could be usefulDataTree.relative_to
should possibly have awalk_up
method (see https://github.com/xarray-contrib/datatree/issues/258).with_name
method might be useful.with_segments
method might be usefulInspired by
pathlib.Path
(i.e. concrete paths):DataTree.walk
method might be a better way to expose the logic in iterators.py.rename
method might be useful.replace
method might be useful.rglob
method (though having this and.glob
seems overkill)Several of these might be useful abstractions internally, especially
.joinpath
,.walk
, and.replace
.EDIT: Let's also document this similarity:
api.rst
to have a sectionPath-like methods