Closed observingClouds closed 1 month ago
I like this idea: dt['*/lowRes']
I have an extract_leaf
utility function that I like so far: dt.dc.extract_leaf('lowRes')
This returns a new tree with two nodes'control'
and 'plus4K'
.
I've also wondered if a reorganize_nodes
would be useful. This would take input in symbolic form so dt.reorganize_nodes('a/b -> b/a')
would reorganize so
Thanks for your comments! I also thought about the restructuring of the datatree similar to your point 3 and think it would be a nice addition. For this particular case I like 1 a bit more because it also allows for data trees with different depths.
I have been thinking about adding similar functions for a while, thanks to both of you for laying out your ideas clearly! In fact your two-level-combinatoric tree structure @observingClouds is the same thing @jbusecke and I discussed in #186.
I like this idea: dt['*/lowRes']
I agree this could be useful, but it's a shortcut for the more obvious dt.search(regex)
, so I think I should add that method first and see how people get on with it.
I have an extract_leaf utility function that I like so far:
dt.dc.extract_leaf('lowRes')
This returns a new tree with two nodes 'control' and 'plus4K'.
I can also see this being useful, but I'm not totally sure what exact behavior you're after from your description. Does it return {/control/lowRes/, /plus4k/lowRes/}
? Or just a shallower tree like {/control/, /plus4k/}
? The idea is conceptually similar to using dt.leaves.search('lowRes')
, but the types wouldn't currently work out if you tried that.
take input in symbolic form so dt.reorganize_nodes('a/b -> b/a')
@dcherian I think this suggestion of taking input in symbolic form might solve the ambuigities @jbusecke and I discussed in #186. The main questions I would have is what to do with nodes that don't have exactly 2 levels, i.e. a/
or a/b/c
?
@observingClouds your DatasetView._path
idea is a really interesting suggestion - I've opened a new issue for it in #266.
I agree this could be useful, but it's a shortcut for the more obvious dt.search(regex), so I think I should add that method first and see how people get on with it.
Actually the simplest "searching" utility would be use glob, not regex. Added in #267.
I would have is what to do with nodes that don't have exactly 2 levels, i.e. a/ or a/b/c?
Raise and error! there's an implied minimum depth necessary to do this?
Raise and error! there's an implied minimum depth necessary to do this?
Right, but what if some nodes have this depth and others don't? I'm thinking of something heterogenous like:
demographics/
weather/highres/model1
weather/highres/model2
weather/lowres/model1
weather/lowres/model1
Actually I guess in this particular case the user would just need to apply the operation to the weather/*
subtree... Is there a scenario in which some leaves are deeper than others? :thinking:
Thinking aloud more, @jbusecke 's CMIP6 case would look like:
model1/historical
model1/scenario1
model1/scenario2
model2/historical
model2/scenario1
model2/scenario2
which after dt.reorganize_nodes('a/b -> b/a')
becomes
historical/model1
historical/model2
scenario1/model1
scenario1/model2
scenario2/model1
scenario2/model2
which seems fine
Is there a scenario in which some leaves are deeper than others?
deeper than the symbolic depth is fine AFAICT. shallower than the symbolic depth should raise an error.
deeper than the symbolic depth is fine AFAICT.
Is it? If I have /1/2/3/
, and I ask for 'a/b -> b/a'
, isn't it ambiguous whether the result should be /1/3/2/
or /2/1/3/
?
You could imagine maybe supporting something like '*/a/b -> */b/a'
, but it's getting more complex.
EDIT: Or maybe you call the method .reorganize_leaves
and have it always give /1/3/2
EDIT2: Or maybe use a flag: dt.reorganize('a/b -> b/a', from_root=True)
I have /1/2/3/, and I ask for 'a/b -> b/a', isn't it ambiguous whether the result should be /1/3/2/ or /2/1/3/?
Personally i'd just expect /2/1/3
Personally i'd just expect /2/1/3
Well I personally would have expected the opposite :rofl:
Or maybe use a flag:
dt.reorganize('a/b -> b/a', from_root=True)
I think this could be a good solution:
1) It will do something consistent by default, and if you don't like that default you can change it.
2) It's the minimum extra complexity to remove the root-first/leaves-first ambiguity.
3) It avoids using asterisks at all, which is good because they suggest string matching, which we are not doing here.
4) Avoiding asterisks prevents you from even expressing ambiguous requests, such as '*/a/b/* -> */b/a/*'
5) The fail cases are obvious: it will work for anything with a long enough path.
Another idea: use preceding slashes to distinguish the two cases, i.e.
'/a/b -> /b/a'
would result in 2/1/3
, but
'a/b -> b/a'
would result in 1/3/2
.
This has all the advatages of the the previous suggestion, but is more filesystem-like, doesn't require an extra kwarg, and allows moving nodes from root to leaves (e.g. '/a/b -> b/a'
).
I would suggest '...a/b -> ...b/a'
Otherwise it's really hard to see at first glance.
I can also see this being useful, but I'm not totally sure what exact behavior you're after from your description.
dt.dc.extract_leaf('lowRes')
returns a tree with two nodes Control
and plus4K
. This new tree is shallower, which then lets me pass it to another function that can't handle a deeper tree.
Here's my list of helper utilities: https://github.com/dcherian/dcpy/blob/master/dcpy/datatree.py
I found them all quite useful, but obviously there's little error checking :) and it does assume trees with a nicely balanced structure,.
Here's my list of helper utilities
Thanks @dcherian !
dt.dc.extract_leaf('lowRes') returns a tree with two nodes Control and plus4K. This new tree is shallower, which then lets me pass it to another function that can't handle a deeper tree.
I think I get it now - that does seem potentially useful!
I found them all quite useful, but obviously there's little error checking
What's the input type for order
is to reorder_nodes
? Is it a list of integer indices?
it does assume trees with a nicely balanced structure,.
I suspect you might be talking about a concept I've been referring to as a "hollow tree". I'll raise a separate issue for that.
What's the input type for order is to reorder_nodes? Is it a list of integer indices?
it expects a list of node names : e.g. tree.reorder(["TAO", ...])
reorders so that "TAO"
is the first node.
https://github.com/dcherian/dcpy/blob/7fb24b50410422ab9fc3257721c2727a075a34cb/dcpy/datatree.py#L24-L27
closed in favor of pydata/xarray#9346
I'm looking for a good way to apply a function to a subset of nodes that share some common characteristics encoded in the subtree path.
Imagine the following data tree
To apply a function to all
control
or allplus4K
nodes is straight forward by just selecting the specific subtree, e.g.dt['control']
. However, in case alllowRes
dataset should be manipulated this becomes more elaborative and I wonder what the best approach would be.dt['control/lowRes','plus4K/lowRes']
is not yet implemented and would also be complex for large data treesdt['*/lowRes']
could be one idea to make the subtree selection more straight forward, where*
is a wildcarddt.search(regex)
could make this even more generalCurrently, I use the @map_over_subtree decorator, which also has some limitations as the function does not know its tree origin (as noted in the code) and it needs to be inferred from the dataset itself, which is sometimes possible (here the length of the dataset) but does not need to be always the case.
I do not know how the tree information could be passed through the decorator, but maybe it is okay if the
DatasetView
class has an additional property (e.g._path
) that could be filled withdt.path
during the call of DatasetView._from_node()?. This would lead toand would allow for tree-aware manipulation of the datasets.
What do you think? Happy to open a PR if this makes sense.