jrtom / jung

JUNG: Java Universal Network/Graph Framework
Other
400 stars 115 forks source link

specifying a new Tree API #83

Open jrtom opened 7 years ago

jrtom commented 7 years ago

In the common.graph branch (the precursor to JUNG 3.0) there are now some types for supporting tree structures: https://github.com/jrtom/jung/tree/common.graph/jung-api/src/main/java/edu/uci/ics/jung/graph

*CTree* are placeholder names.

In that branch, the Tree types and implementations have been removed, so those names are now available. (I'm not worried about breaking anyone by reusing those types, because JUNG 3.0 has already burned that barn behind us, as it were: moving from JUNG 2.x to JUNG 3.x will require a migration from existing users.)

Points to consider:

jbduncan commented 7 years ago

My current thoughts on this are as follows:

  • Are we going to try to also provide an API for binary trees? If so, what should that look like?

I believe we should refer to Guava's BinaryTreeTraverser for good (if not minimal) initial inspiration on this. For example, it seems obvious to me that we'd want specific methods for obtaining the left child and right child of a particular node.

However, it's not so clear to me what "child" would mean for edges in NetworkTrees/TreeNetworks; perhaps successors for directed TreeNetworks, but for undirected ones it's not so clear to me.

  • We already have a Network-based variant, i.e., "CTreeNetwork". Which would be better: "TreeNetwork" or "NetworkTree"? Is there some third alternative?
  • We don't yet have a ValueGraph analogue. That should probably be ValueTree (which perhaps militates in favor of "NetworkTree").

I cannot think of alternatives for "TreeNetwork"/"NetworkTree" and "ValueTree" ATM, so I'd personally be happy with "NetworkTree" to be consistent with "ValueTree".

jbduncan commented 7 years ago

Ah, it occurs to me that directed edges would only really have one child each, and undirected edges would potentially have two children or no children (depending on how we'd want to define "children"), since edges can only have 2 associated nodes IIUC - no more, no less.

jbduncan commented 7 years ago

And it also occurs to me that Trees are inherently directed, so having undirected edges in a NetworkTree wouldn't make much sense...

jrtom commented 7 years ago

So, funny story about BinaryTreeTraverser: while we were looking into incorporating the Traverser stuff into common.graph, I recently established that ~no one uses BTT either internally or externally, and so we're almost certainly going to be deprecating it.

That doesn't mean that we might not want a binary (or k-ary) tree implementation, but I'm not yet 100% convinced it's worth it. (It may be worth looking at the Tree types in the common.graph branch to see what the current state of things is.)

it's not so clear to me what "child" would mean for edges in Networks

I would not be inclined to define "child" for an edge. A child is just a particular kind of successor.

jbduncan commented 7 years ago

So, funny story about BinaryTreeTraverser: while we were looking into incorporating the Traverser stuff into common.graph, I recently established that ~no one uses BTT either internally or externally, and so we're almost certainly going to be deprecating it.

Sounds like a rather good reason to me then to leave binary or k-ary trees to one side for now and perhaps only work on them if someone requests them?

But having said that, I'm kind of curious how you came to the conclusion that no-one uses BTT externally. Would you be able to shed some light on this?

jrtom commented 7 years ago

I agree that while it may be useful to think about how binary/k-ary trees would fit into the type system, we probably don't need to actually write code to support them absent some requests to that effect.

How I came to the conclusion that noone uses BTT externally is that I did some analysis on this dataset: https://cloud.google.com/bigquery/public-data/github (Details on the specific queries I ran are available if you're curious.)

It's obviously not a complete picture, but I'm willing to take that data set as being a reasonable proxy for "everyone" given its size. There were only three distinct uses of BTT in that entire data set, and that none of them actually required BTT (TreeTraverser would have done just fine; the only thing that BTT gives you is the ability to do inorder traversal).

jrtom commented 7 years ago

Oh, and a brief placeholder thought for later: I'm not sure that there's much value in a Network variant of a Tree. If one wants an object associated with each edge, ValueGraph does that just fine. Might be worth waiting to see if someone asks for a TreeNetwork.

jrtom commented 7 years ago

Another issue we're going to need to address here : make sure that the Tree classes are tested as well as the Guava common.graph classes (and not just indirectly via TreeUtils, etc.).

jbduncan commented 6 years ago

@jrtom I'm happy to have a go at increasing test coverage for Tree et al, if you're not already working on this; I'm sure that I could use Guava's existing tests for [Value]Graph|Network as a good starting point. :)

jbduncan commented 6 years ago

@jrtom Actually it occurs to me again that Guava's tests are licensed under the Apache 2.0 license, whereas JUNG is licensed under one of the BSDs.

Would you prefer that I do the tests from scratch, or are you happy that I take ideas from Guava's tests?

jrtom commented 6 years ago

@jbduncan The licensing requirements of Apache 2.0 and the 3-clause BSD license (which JUNG uses) are compatible, as I understand it, so I don't think that taking inspiration from Guava's tests should be a problem (although we always meant to reconsider how they were put together given time).

jbduncan commented 6 years ago

@jrtom I admit that I'm struggling to understand what you mean when you say, "(although we always meant to reconsider how they were put together given time)". Would you mind clarifying for me what you meant by "they"?

...so I don't think that taking inspiration from Guava's tests should be a problem...

SGTM. :)

jrtom commented 6 years ago

I apparently missed the above comment. What I meant by "they" was the common.graph unit tests. I mean, they're not terrible, but I remember a discussion from a few years ago on how to refactor them to reduce redundancy (and make it easier for others to reuse the code).

jrtom commented 6 years ago

Not sure why GitHub thinks this issue should have been closed, there are still unresolved issues here, such as: (1) what tree-related classes do we need to hang onto (2) what are we doing to name these classes (they're not keeping the current names!).

Side note: there's been internal discussion of making public the ForwardingGraph implementation that common.graph has; if we do that, then these tree implementations become easier.