Closed GoogleCodeExporter closed 9 years ago
Maybe it would be useful to have a Tree interface
Tree<T> {
T getParent();
Collection<T> getChildren();
}
also see issue 187
Original comment by zeitlin...@gmail.com
on 10 Jun 2009 at 7:46
Tree's are inherently recursive by definition. Each child of a node a Tree
itself,
with or without children nodes. As a result, the above interface wouldn't
really be
expressive enough (especially as the "key" value for the is not considered).
The next point to consider is that tree traversal can consist of pre-order,
in-order
and post-order traversals, not to mention breadth-first.
Original comment by gpampara@gmail.com
on 22 Jul 2009 at 4:02
I have always very much wanted to have a standard TreeNode<T> interface. I will
give
it more thought when I can. I think that is a far better approach than the
Function-
based one.
The interface _might_ end up needing to be something like this
interface TreeNode<T extends TreeNode<T, C>, C extends Iterable<T>> {
// optional; returns null if root
TreeNode<T, C> parent();
C children();
}
This way, different types of trees can use different types of collections
(list, set,
multiset, what-have-you). It's a little confusing to grasp at first though.
We'll
see what shakes out.
Original comment by kevin...@gmail.com
on 17 Sep 2009 at 5:41
Original comment by kevin...@gmail.com
on 17 Sep 2009 at 5:45
Original comment by kevin...@gmail.com
on 17 Sep 2009 at 5:57
You don't really need a defined data structure to implement the most common
traversals (breadth, pre/post-
order, and full depth-first - tracking start/finish times), as the original
poster points out. I would implement
it differently (see below), but the principle holds.
That's not to say there isn't a good use case for a tree data structure, just
that those traversals don't require
it. If adding both things, it would be a good idea for some static utils
methods that simplify the common use
case when you do want to traverse an explicit tree structure instead of the
functional way. There's no reason
the API can't be expressive enough for both strategies.
Additionally, if going down that data structure road, you may seriously want to
consider keeping it consistent
with the Collections model of data structures being containers for
user-supplied objects. In that light, I think
TreeNode is inappropriate for the same reason that ListNode would be. You don't
ask the elements of a List
for previous/next elements, you ask the List data structure itself.
One drawback to this is that it can be very difficult to maintain the invariant
of being a Tree while building
one. My mutable structures tend to be either general graphs or forests, with an
ability to get tree views (tree
rooted at a given forest node, e.g.).
As an aside, I'm in the process of a complete overhaul of my own graph library,
having been inspired by your
excellent work with GCL - in particular the realization of how much readability
can improve an API (and
having written it pre-generics). This is a working concept for how traversals
would look (not compilable,
poorly formatted, no method bodies - keeping it terse). For everyone out there,
feel free to use any ideas you
like.
public interface Step< E >
{
E from();
E to();
}
public interface PruningIterator< E > extends Iterator< E >
{
// Don't explore beyond the last object returned by next()
public void prune();
}
public class Steps
{
// convenience methods, so users don't have to explicitly create Steps
public static Iterator< Step< E > > wrap
( E from,
Iterator< E > toIterator )
public static Function< E, Iterator< Step< E > > > wrap
( Function< E, Iterator< E > > adjacencyFunction )
}
public class Traversals
{
public static PruningIterator< Step< E > > preOrder
( E start,
Function< E, Iterator< Step< E > > > adjacencyFunction )
// can't prune() a post-order traversal
public static Iterator< Step< E > > postOrder
( E start,
Function< E, Iterator< Step< E > > > adjacencyFunction )
public static Iterator< Step< E > > topologicalSort
( Iterator< E > nodeIterator,
Function< E, Iterator< Step< E > > > adjacencyFunction )
// etc. plus similar methods taking just Function< E, Iterator< E > > and doing the wrapping for you.
}
All of these can be implemented using stacks, no recursion, without too much
difficulty. The primary reason
for the Step interface is that you often want to know from which node you
traversed, and this information
would not otherwise be available.
I can use this same API to traverse the file system, up a tree instead of down,
or any other graph-like
structure, without creating an explicit data structure to do it. All I need to
define is that function which takes
a node and returns an iterator of adjacent nodes. Not creating an actual data
structure (like a Collection) is
useful when the backing data is huge, and perhaps not even queryable in a
global way. Having to implement
things like size() can really be a pain sometimes, and at work we've often just
decided to throw an
UnsupportedOperationException instead.
At any rate, a work in progress, but that's the general flavor. The edges in my
data structures also contain
user-defined objects, so I need to allow for that without polluting the rest of
the API. Still working through
the implications of that.
Original comment by ray.a.co...@gmail.com
on 2 Oct 2009 at 11:35
I recently needed a version of my walk iterable from above with a possibility
to get
the path of a node to the root. So I needed a TreeNode interface, without
needing the
children (although it could easily be added).
I'm attaching the relevant bits (thrown together so it would compile, but i did
not
spend too much time on it so please excuse for errors). Scroll down for an
example in
printWalkWithParentsTest()
Original comment by jvdne...@gmail.com
on 10 Dec 2009 at 5:34
Attachments:
Original comment by kevinb@google.com
on 30 Jul 2010 at 3:53
Original comment by kevinb@google.com
on 13 Jul 2011 at 6:18
Original comment by kevinb@google.com
on 16 Jul 2011 at 8:37
Original comment by wasserman.louis
on 1 Sep 2011 at 5:35
Not sure if I pointed out that we have stopped thinking about the "TreeNode
interface" approach.
I believe it is common to come into contact with objects that are
tree-structured, but which you don't control, so you can't get to implement the
TreeNode interface you want them to. Examples: java.io.File, or the containment
tree of Java program elements (i.e. AnnotatedElement).
If we provided a TreeNode interface you'd have to create an adapter instance
for every single node. And your adapter would, every time it grabs its list of
children, have to wrap each of them, and then you'd start to want to hold on to
these adapters and in the end, you've just created a whole entire parallel tree
is all you've done.
This sucks. What Louis and I propose is this:
interface TreeViewer<T> {
Iterable<T> children(T node);
}
abstract class BinaryTreeViewer implements TreeViewer<T> {
Optional<T> leftChild();
Optional<T> rightChild();
// implements children() to return those that are "present"
}
class Trees {
static <T> Iterable<T> breadthFirstView(T root, TreeViewer<T> viewer) {...}
static <T> Iterable<T> preOrderView(T root, TreeViewer<T> viewer) {...}
static <T> Iterable<T> postOrderView(T root, TreeViewer<T> viewer) {...}
static <T> Iterable<T> inOrderView(T root, BinaryTreeViewer<T> viewer) {...}
// perhaps also with void "visit" versions as well
}
Any thoughts out there?
Original comment by kevinb@google.com
on 2 Sep 2011 at 11:27
I believe this is the right direction. I still maintain that a more general
interface will be more useful. I don't mean more general than TreeViewer<T> in
syntax, but in implied semantics. The language "children" and "TreeViewer" have
clear implications that I think are unnecessary. It's just a name change from
that to this:
interface AdjacencyFunction<T> {
Iterable<T> adjacent( T node );
}
And that interface works just as well for breadthFirst/preOrder/postOrder
traversals. The util class then becomes "Traversals". The only difference is
that you would have to clearly document the traversal behavior when given a
Adjacency that produces cycles, although you should probably do that anyway
(think soft links in the file system).
Original comment by ray.a.co...@gmail.com
on 3 Sep 2011 at 9:06
Wasn't quite done, hit the "Save changes" button by accident.
With class name of Traversals or similar, the method names can be shortened. It
just becomes Traversals.breadthFirst( T, Function< T, Iterable< T > > ).
Note that both "children" and "parent" are just specific adjacency
relationships. Pre-order for the parent relationship goes from the given node
to the root, post-order goes from the root to the given node. You get a lot of
flexibility like this for free.
Given that this gets more graph-like, static factory methods for walks (follows
the first node returned from the adjacency Iterable, ideal for parent
traversal) and a topological sort are appropriate.
This doesn't directly address the special case of a binary tree, and in-order
traversal. We've never had that use case where I work, so I honestly haven't
given much thought to how that fits in to this strategy.
I haven't found a clean way to handle cycles within the library. I ended up
with a factory method which turns a given adjacency function into an acyclic
one by keeping track of what it has hit, but I really don't like that solution
because it turns a stateless function into a stateful one that is not reusable.
There must be a better way without resorting to adding distinct acyclic
traversal factory methods, but I haven't found it.
Original comment by ray.a.co...@gmail.com
on 3 Sep 2011 at 9:32
Oh yeah, a couple more things. Sorry, should have composed my thoughts first.
Using just Function<T, Iterable<T>> allows using the rest of Guava's Functions,
Predicates, etc. with these adjacency functions.
We've also found it useful to think of traversals as, themselves, adjacency
functions. You can then get some interesting behavior by composing them. For
example (forgive the names, they need changing):
class SomeUtilClass {
static <T> Function<T, Iterable<T>> breadthFirst( Function<T, Iterable<T>> );
// etc.
}
That method accepts one adjacency function and produces another. The resulting
function will accept a node and return, wrt the argument function, all
reachable nodes in a breadth-first order.
That and some filtering allows you to simply do things like "Find all
references to all declared variables" when working with a linked AST. Or
complex traversals of social graphs.
Original comment by ray.a.co...@gmail.com
on 3 Sep 2011 at 9:47
I'm pretty strongly against supporting graphs or anything that is not strictly
a tree. That's trying to "do too much."
At the moment, we have a nice, simple abstraction that matches a clearly
defined set of use cases, and does it extremely well. Guava isn't currently in
the business of manipulating graphs, and changing that is beyond the scope of
this issue.
Original comment by wasserman.louis
on 5 Sep 2011 at 11:01
Point taken, and I don't really disagree. Just saying that there isn't a
logical code change involved, only names. The code for breadth- and depth-first
traversals is exactly the same. But again, binary trees are another thing
entirely.
Sorry for going off the deep end there. I just got a little excited that the
brain trust behind Guava would be brought to bear on abstractions I deal with
daily. It would be nice to have something 100x better tested than what I've
been able to cobble together during the calm moments between putting out fires.
I never seem to have enough time to go back and fix the abstractions, no
sellable business case for it.
Original comment by ray.a.co...@gmail.com
on 6 Sep 2011 at 1:44
There is such a thing as "trying to do too much" and a thing as "too simple",
I'm not sure which is which here. Even a filesystem is a tree only if you
ignore symlinks/treat as leaves.
I'm surprized that we have a "clearly defined set of use cases" here (that's a
rarity!), is it in copy-pastable form?
PS: Instead of Iterable<T>, it is richer to return Iterable<Stack<T>> (for some
persistent Stack), for basically the same cost, and which can be viewed as
Iterable<T>, but that would indeed be in the "too much" direction.
Original comment by jim.andreou
on 6 Sep 2011 at 2:13
My goal with this approach was to be dumb, simple, and utterly straightforward.
I think I've accomplished this goal.
(And while filesystems are convoluted and complicated, XML documents might be a
good example of genuine trees.)
Original comment by wasserman.louis
on 6 Sep 2011 at 2:25
Heh, I considered mentioning XML but thought that you wouldn't be interested.
An XML document is only a tree if you ignore ID/IDREF(S) or treat them as
leaves.
Original comment by jim.andreou
on 6 Sep 2011 at 4:06
Maybe this is irrelevant because of some previous posting, but for me the
following worked well:
public interface TreeClassAdapter<T> {
Iterable<T> children(T node);
T parent(T node);
Class<T> nodeClass();
boolean isLeaf(T node); // useful to distinguish between nodes which could possible have children but happen to have none (like empty directories) and proper leaves (like files), cf. javax.swing.tree.TreeModel
}
This way one can easily organize arbitrary objects in trees, without having to
create a wrapper for each object (which is quite a PITA, since you need to
switch back and forth often). Adapting objects to this interface is easy, for
example for java.io.File it takes just a few lines.
Original comment by Maaarti...@gmail.com
on 14 Sep 2011 at 4:00
I have to admit that my reaction to that is, ewwwwwwwwwww.
I think the current interface serves most conceivable needs without
overspecifying the tree format.
Original comment by wasserman.louis
on 14 Sep 2011 at 4:10
Interface TreeViewer is essentially just a Function<T,Iterable<T>>
Original comment by jvdne...@gmail.com
on 14 Sep 2011 at 5:10
Correct! But calling it that would make the API much less clear and more
awkward to use. Doing it *properly,* like this, makes it more natural and
extensible -- for example, the BinaryTreeViewer abstract class would be
difficult or impossible to provide.
Original comment by wasserman.louis
on 14 Sep 2011 at 5:49
Fair enough. I understand the benefits of a specific interface. I'm just easily
tempted to reduce plumbing because I know I might need a Function that does the
same sooner or later anyhow. (For instance to do a concat(transform(...)) in a
different context)
Also, occasionally, I find it useful/necessary to build a parallel tree.
Original comment by jvdne...@gmail.com
on 14 Sep 2011 at 10:38
Explain please what you mean by "parallel tree"? (I can imagine several
different meanings, so I'm interested in specific use cases.)
Original comment by wasserman.louis
on 15 Sep 2011 at 2:07
For instance, you have a tree of nodes in which each node only has references
to its children. What if you need to process this tree in a way you need parent
references? You could map every node of the tree to a new object with a
reference to a parent of the same type and a reference to the original node.
That would be a parallel tree.
I have needed this in the past and wrote the following code (which you might
reasonably call ugly/bad code):
clarification: walk is preorder traversal; map is transform; mapcat is
concat(transform(...))
{{{
public static <T> Iterable<T> walk(final T root,
final Function<? super T, ? extends Iterable<? extends T>> fnGetChildren) {
final Function<T, Iterable<? extends T>> fnWalk = new Function<T, Iterable<? extends T>>() {
@Override
public Iterable<? extends T> apply(final T from) {
return walk(from, fnGetChildren, this);
}
};
return walk(root, fnGetChildren, fnWalk);
}
private static <T> Iterable<T> walk(final T root,
final Function<? super T, ? extends Iterable<? extends T>> fnGetChildren,
final Function<? super T, ? extends Iterable<? extends T>> fnWalk) {
return concat(singleton(root), mapcat(fnWalk, fnGetChildren.apply(root)));
}
public interface TreeNode<T> {
public TreeNode<T> parent();
public T item();
}
public static <T, TreeNodeType extends TreeNode<T>> Iterable<TreeNodeType>
walkWithParents(final T root,
final Function<? super T, ? extends Iterable<? extends T>> fnGetChildren,
final Function2<? super TreeNodeType, ? super T, ? extends TreeNodeType> nodeFactory) {
final Function<TreeNodeType, Iterable<TreeNodeType>> fnPairChildren = new Function<TreeNodeType, Iterable<TreeNodeType>>() {
@Override
public Iterable<TreeNodeType> apply(final TreeNodeType parent) {
return map(Functions2.bind1st(nodeFactory, parent), fnGetChildren.apply(parent.item()));
}
};
final Function<TreeNodeType, Iterable<TreeNodeType>> fnWalk = new Function<TreeNodeType, Iterable<TreeNodeType>>() {
@Override
public Iterable<TreeNodeType> apply(final TreeNodeType from) {
return walk(from, fnPairChildren, this);
}
};
return walk(nodeFactory.apply(null, root), fnPairChildren, fnWalk);
}
}}}
Original comment by jvdne...@gmail.com
on 15 Sep 2011 at 3:56
Implementing a traversal based on the comments here I stumbled upon something
you might want to consider. I needed to keep track of the current depth during
the traversal, so I used an Iterator based (rather that Iterable based)
approach using a special TreeIterator:
public static <T> TreeIterator<T> postOrderView(final T root, final Function<?
super T, ? extends Iterator<? extends T>> childrenFunction);
public interface TreeIterator<T> extends Iterator<T> {
int depth();
}
Original comment by christop...@gmail.com
on 5 Oct 2011 at 4:27
Interesting. I'm not sure that's a sufficiently common use case to get special
treatment, but I'll look into it.
Original comment by wasserman.louis
on 5 Oct 2011 at 5:26
The most general approach is providing an iterator of paths (from root to each
node), which could be viewed as an iterator of values (much like Jared's
LinkedNodeList does). The best representation for these paths are persistent
stacks, the whole cost is one object per node, and they are immutable. Just in
case anyone really wants to generalize
Original comment by jim.andreou
on 6 Oct 2011 at 12:19
That could work.
Original comment by wasserman.louis
on 6 Oct 2011 at 12:52
Issue 703 has been merged into this issue.
Original comment by wasserman.louis
on 15 Nov 2011 at 3:50
Original comment by fry@google.com
on 10 Dec 2011 at 3:38
When can we expect TreeViewer and other Tree related Interfaces/Classes to be
available ? I was browsing the 11.0.1 javadocs and could not locate these.
Thank you,
Original comment by anjan.ba...@gmail.com
on 18 Jan 2012 at 4:30
I believe we're testing some ideas internally, but this is just such a broadly
applicable and common need that it's worth taking lots of time to experiment
with different designs.
Original comment by wasserman.louis
on 19 Jan 2012 at 1:38
[deleted comment]
Original comment by kevinb@google.com
on 30 May 2012 at 7:43
[deleted comment]
If you want to use Tree data structure and don't want to wait until it will be
added to Guava I suggest using my implementation which can be found on github:
https://github.com/serso/common/tree/master/collections-tree/src/main/java/org/s
olovyev/common/collections/tree
The interfaces are self explaining. One can start from the Trees.java class.
Original comment by se.solov...@gmail.com
on 2 Feb 2013 at 11:36
Initial versions of TreeTraverser and BinaryTreeTraverser are slated for 15.0:
https://code.google.com/p/guava-libraries/source/browse/guava/src/com/google/com
mon/collect/TreeTraverser.java
Original comment by lowas...@google.com
on 9 Mar 2013 at 6:13
FWIW, I think something like the EMF TreeIterator (see
http://download.eclipse.org/modeling/emf/emf/javadoc/2.4.3/org/eclipse/emf/commo
n/util/TreeIterator.html) which is basically an Iterator with an additional
prune() method is indeed very useful.
Original comment by knut.wan...@gmail.com
on 9 Mar 2013 at 6:46
If adding an Iterator subinterface with a prune() method, then I suggest naming
the interface more generically than "TreeIterator". Pruning is simply saying "I
don't want to traverse beyond the last node any further". I haven't delved into
the committed code, but I suspect that the traversal will work for any DAG just
fine. Actually, it looks like pre-order and breadth-first will work for any
graph, cyclic or not, provided you don't continue to call next() ad infinitum.
Original comment by ray.a.co...@gmail.com
on 12 Mar 2013 at 10:14
My model has changed from the last time I responded here. This isn't a
suggestion, merely throwing it out there. I have edges (user-defined, weights
e.g.) as well, but I've elided that here and changed a few things as a result.
This is minimal pseudo-code, for terseness.
interface Walk<T> {
T getFrom(); // convenience, first element of path
T getTo() // convenience, last element of path
Iterable<T> getPath();
}
interface Traverser<T> extends Function<T, Iterable<Walk<T>>>
class Traversers {
static <T> Traverser<T> empty()
static <T> Traverser<T> preOrder( Traverser<T> adjacency )
// similar postOrder, breadthFirst, leaves, a few others
}
In my use case, I need the entire path taken to reach each visited vertex.
Storage isn't bad if you're clever (similar to lisp cons cell, as Dimitris
Andreou mentioned). If you don't need that, replace Walk<T> with just T.
Traverser<T> is both how one defines an adjacency function (typically
user-defined), and how pre/post/... traversals are defined. An adjacency
function simply returns very short walks. In the language of your
implementation, both children(T) and preOrderTraveral(T) return Iterable<T>.
I'm pretty sure your reaction will be that this over-generalizes, and I can't
dispute that. It's in my nature. I could certainly see not using Function here,
and using a specific method name for Traverser instead.
Original comment by ray.a.co...@gmail.com
on 13 Mar 2013 at 4:10
@ray: have you looked at the current implementation linked above?
(I claim that it's general enough to let you optionally keep track of the paths
in the tree: you just write a TreeTraverser<Walk<T>> instead of a
TreeTraverser<T>.)
I'm not sure what, if any, features we'll include to prune the traversal, but
we might do something along the lines of doNotExpand(Predicate<T>).
Original comment by wasserman.louis
on 13 Mar 2013 at 7:11
@Louis
Thanks for pointing out the potential to use a TreeTraverser<Walk<T>>, I had
not seen that possibility. That would also work with my actual Walk including
edge objects. I'll have to take a closer look in light of that to see where the
gaps are between the two, as I'd rather use Guava than my own. One difference
is that a user would need to supply the root node as a trivial Walk, which is
slightly cumbersome. I still like the fact that adjacency (children) and
complex traversals are the same kind of thing, but I have to reexamine my use
cases to see if that feature is actually useful.
The other kinds of traversals I have needed in the past are: depth-first (both
out and back, each node visited twice), topological sort (taking an Iterable of
nodes, not just a single root node), and leaves. To be really useful, a
depth-first iterator has to be able to tell you which direction you're
traversing.
Re: pruning. In my experience, whether or not to prune usually depends on your
program state at that point in the traversal. For example, you might prune if
exploration beyond that node is pointless because you've already found a lower
cost result.
Original comment by ray.a.co...@gmail.com
on 14 Mar 2013 at 2:26
That is true, but we're not trying to be the be all and end all of tree
traversals: if we make a wide variety of use cases simpler with a minimum of
API complexity, we've done our job.
In particular, I don't want to track the full path of the traversal in every
case because that would incur prohibitive memory costs in cases where the tree
is implicit or not held in memory all at once, and I am uncertain that adding a
whole new iteration API to deal with pruning helps enough additional users to
merit the added complexity.
Original comment by lowas...@google.com
on 14 Mar 2013 at 4:03
To me, the useful thing about TreeTraverser is being able to iterate over the
elements of the tree as if it were flat, when that's what you want to do. If
you need to deal with the tree in a more tree-like way, do that. But I don't
really like the idea of trying to combine treating the tree as flat and as a
tree by introducing something cumbersome like prune() and requiring you to
iterate with a special kind of Iterator, etc. The idea of using a Predicate to
determine whether or not the subtree of a node should be iterated seems ok to
me, but nothing more than that.
Original comment by cgdec...@gmail.com
on 14 Mar 2013 at 6:40
re: pruning...
Extending Iterator is ugly. I've done that in the past, and it's not pretty,
and even uglier with Iterable. Returning an Iterable<Something<T>>, where
Something has a prune() method, would be better. Not good, just better. But
then iteration isn't nearly as clean for the common use case. That argues for
more methods, pruning vs. normal pre-order and breadth-first. And then your API
is uglier. I need pruning, and I'm not even sure that's worth it. I can always
write my own using a given TreeTraverser.
If using a stateless Predicate for limited pruning, I would suggest creating a
new TreeTraverser, similar to Joiner.skipNulls(). I think that's what you
suggested with doNotExpand(Predicate), but I'm not sure.
Original comment by ray.a.co...@gmail.com
on 15 Mar 2013 at 8:02
> If using a stateless Predicate for limited pruning, I would suggest creating
a new TreeTraverser, similar to Joiner.skipNulls(). I think that's what you
suggested with doNotExpand(Predicate), but I'm not sure.
Yes, exactly.
Original comment by cgdec...@gmail.com
on 15 Mar 2013 at 8:12
re: stuff other than pruning...
I didn't mean to imply that TreeTraverser should keep track of paths from root.
I agree that TreeTraverser<Walk<T>> would do that, and it's just my problem to
define children(Walk<T>) correctly.
I wasn't really pushing those other use cases (topo sort, ...), just bringing
them up because your project has a good history of trying to solve other
people's problems when it makes sense for Guava. Consider it a disclosure of
how one of your users uses traversals. Any of these can be added by users in
their own code easily enough.
Original comment by ray.a.co...@gmail.com
on 15 Mar 2013 at 8:19
Original issue reported on code.google.com by
jvdne...@gmail.com
on 25 May 2009 at 2:34