Refactor algorithms package

jrtom commented 7 years ago

Potential areas for improvement:

reduce the number of subpackages and classes
use the Builder pattern to avoid multiple overlapping constructors
generally unify the way in which algorithms are invoked

jbduncan commented 7 years ago

I wonder if Layout and all its implementing classes in the layouts/ subpackage should be moved to the visualization module, as I struggle to see currently what use they have beyond graph visualization and Swing.

jbduncan commented 7 years ago

I also wonder if the util/ subpackages should be merged into their parent subpackages, making all classes that were in those packages package-private so as to make them inaccessible to JUNG users (which I presume is a good thing in this case).

jbduncan commented 7 years ago

I've also noticed that quite a lot of algorithms classes are non-final and expose protected members for extension via inheritance. I wonder if there's a way we can make those algorithms extendable via composition and/or the Builder pattern instead. (This is something which I have rather limited experience with, so it's not clear to me yet how we'd investigate this.)

jrtom commented 7 years ago

The Layout algorithms can be (and in some cases are) used by some people who want a layout algorithm but want to do their own rendering. Layout and rendering can, and should, be kept separate.

The one fly in the ointment is that currently they use a Dimension instance to specify the size of the canvas, and we should create our own class so as to remove that dependency on Swing.

We could conceivably flatten the package hierarchy in that way as well. I suggest that we need a design doc for figuring out where things go; issue updates is not a good place for that sort of discussion.

The fact that most algorithms classes are non-final and either don't need to be (because subclassing them is not really appropriate or useful) or could be customized in another way is certainly something we can look into as part of this overall issue.

tomnelson commented 7 years ago

I'd like to see us remove from all Layouts, the dependencies on java.awt.Dimension and java.awt.geom.Point2D If we had a Tuple laying around in the API, that would work fine for both.

I'd also like to see the Layout implementations be only a Function<V,P> that holds no cached locations state. We can provide delegates that do caching and leave users the option of implementing their own preferred caching mechanisms if they choose.

jbduncan commented 7 years ago

Using delegate classes to do caching for Layouts sounds like a rather good idea to me.

I imagine we'd replace java.awt.Dimension with our own immutable Dimension class, with a built-in method to produce an equivalent java.awt.Dimension object like .toAwtDimension(). I also imagine something similar for Point2D.

tomnelson commented 7 years ago

Glad you like the idea about caching. I'd rather let the implementors of some visualization module (based on awt/swing/javafx/etc) provide their own code to convert our dimensions and points to what they need. My goal is to have no visualization library dependencies (awt) in our algorithms module.

jbduncan commented 7 years ago

Oh yeah, of course! Having methods like toAwtDimension() would mandate that we depend on AWT/Swing for our own Dimension classes. In that case, please feel free to disregard that particular idea. :)

jbduncan commented 7 years ago

Hmm, perhaps as an alternative to delegate classes, we could allow customisation through a Builder.

Or alternatively, we could have two specific Static Factory Methods for each Layout: one that returns a "normal" Layout, and another that returns a caching variant.

@tomnelson Do you have any thoughts about these two ideas?

tomnelson commented 7 years ago

I was thinking that it would be nice to have Layout implementations that contain only the functional algorithm code and no persisted state of the graph. Those functional classes could be used (by delegates... subclasses ....) that apply their own caching, including doing things like using persistent storage for very large graphs. For our own demos, we would want something that caches locations. Perhaps that one something could be used by any of our layouts. I'd like to have Factory methods or Builders be separate from the classes that just implement the layout algorithms.

jbduncan commented 7 years ago

Okay, that sounds reasonable to me. :+1:

jrtom commented 7 years ago

I agree that we should have our own internal versions of Dimension and Point.2D to break the algorithms' package dependency on AWT/Swing, and that it should be the visualization package's job to provide converters between those types and the AWT/Swing types (and to do the actual conversion when needed).

jrtom commented 7 years ago

I'm not entirely sure what you're proposing that a Layout would look like in this proposed architecture. I guess you're talking about splitting Layout into two types: one for the algorithm that calculates node positions, and one that stores and retrieves them.

I think you'd still want to have a single Layout type, though, unless we decide that we no longer want to support animating changes in layout. That is, if the visualization system only knows about the type that reports node location, then it has no way to tell the Layout to update positions, either in response to user input or in response to time passing/the visualization updating. It's been a while since I've looked at this aspect of the visualization code, but IIRC it's the visualization system's responsibility to call Layout.step(). On the other hand, the algorithm necessarily must be able to both read from and write to the node locations, so the type that includes the algorithm may as well also expose the node locations on request. (Otherwise the visualization system would have to keep track of both of these things, and would probably end up creating a wrapper type to keep them together anyway, for convenience.)

Note that Layout itself (that is, the interface) doesn't have any notion at all of where the position information is being stored. All it has is the apply() method that returns a position for a given node.

So perhaps what this proposal boils down to is the notion that the Layout instances that we provide should have the option of being given an instance of something like a (writeable) Function on creation (which would presumably default to an internal Map, as usual).

I don't think that there's much harm in doing something like that, and one could argue that it's a cleaner design in some ways, but I'm not sure that our users would derive much benefit from it, either, at least with our current visualization architecture. In practice, if you're trying to lay out a graph that's so big that you can't store the node locations in memory, you're already dead in the water, because there's no way that either the force-directed layouts or the rendering are going to tolerate you trying to visualize a few million nodes (they might not even if we had better (read: any) spatial data structures, but they sure won't now). There's also a real question in my mind of the utility of trying to render such a graph at all, but that's a different discussion.

Are there anticipated benefits that I'm missing?

I'm certainly open to using builders for creating layouts, and that would make such a change easier in future. But I'd like to hear more about the anticipated benefits that providing this flexibility would provide.

(Side point: I don't think that there's any particular point in Layout implementing Function any more, now that we're in a Java 8 world. The one place I can think of where it's at all relevant is that it makes it marginally easier to have a Layout whose initial positions depend on another's, and even that capability wouldn't go away, we'd just either need to use a Layout::getPosition reference or provide an overload of setInitializer that took a Layout. Am I missing anything?)

tomnelson commented 7 years ago

I'm starting with the notion that Layout is not part of the jung-visualization module. Maybe there are cases where someone wants to make a graph, apply a layout algorithm, and ask what nodes are within some distance from some point. I think that would work for extremely large graphs that have locations stored on disk. Whatever the possible applications are, we did not put Layout in jung-visualization, so let's not let visualization challenges limit the design of our Layouts.

You brought a lot of clarity to my initial (naive) ideas. I think that my real target is to separate the way we store locations from the way we apply an algorithm. We already mostly have this: AbstractLayout contains the code that stores locations while the subclass implementations have the various algorithms. Perhaps if AbstractLayout had a way to inject the thing that stores the locations.... maybe we are almost there (or close) by changing the old lazy maps to the CacheBuilder. I believe that your paragraph 4 says it perfectly.

As to the replacement of java.awt.Dimension and java.awt.geom.Point2D, We seem to be leaning towards writing new classes to add to jung. My preference is to use something 'good' that someone else wrote, unless it is kind of obscure. We went down that path with collections-generic way back when there we no better alternatives. Instead of writing our own Dimension and Point, we could use the Point2i and Point2d classes from the vecmath package. vecmath seems like a reasonable dependency (to me) and the 2 classes I mentioned seem to have equivalent features to those in the awt classes, without the awt baggage.

On Fri, Sep 15, 2017 at 2:10 PM, jrtom notifications@github.com wrote:

I'm not entirely sure what you're proposing that a Layout would look like in this proposed architecture. I guess you're talking about splitting Layout into two types: one for the algorithm that calculates node positions, and one that stores and retrieves them.

I think you'd still want to have a single Layout type, though, unless we decide that we no longer want to support animating changes in layout. That is, if the visualization system only knows about the type that reports node location, then it has no way to tell the Layout to update positions, either in response to user input or in response to time passing/the visualization updating. It's been a while since I've looked at this aspect of the visualization code, but IIRC it's the visualization system's responsibility to call Layout.step(). On the other hand, the algorithm necessarily must be able to both read from and write to the node locations, so the type that includes the algorithm may as well also expose the node locations on request. (Otherwise the visualization system would have to keep track of both of these things, and would probably end up creating a wrapper type to keep them together anyway, for convenience.)

Note that Layout itself (that is, the interface) doesn't have any notion at all of where the position information is being stored. All it has is the apply() method that returns a position for a given node.

So perhaps what this proposal boils down to is the notion that the Layout instances that we provide should have the option of being given an instance of something like a (writeable) Function on creation (which would presumably default to an internal Map, as usual).

I don't think that there's much harm in doing something like that, and one could argue that it's a cleaner design in some ways, but I'm not sure that our users would derive much benefit from it, either, at least with our current visualization architecture. In practice, if you're trying to lay out a graph that's so big that you can't store the node locations in memory, you're already dead in the water, because there's no way that either the force-directed layouts or the rendering are going to tolerate you trying to visualize a few million nodes (they might not even if we had better (read: any) spatial data structures, but they sure won't now). There's also a real question in my mind of the utility of trying to render such a graph at all, but that's a different discussion.

Are there anticipated benefits that I'm missing?

I'm certainly open to using builders for creating layouts, and that would make such a change easier in future. But I'd like to hear more about the anticipated benefits that providing this flexibility would provide.

(Side point: I don't think that there's any particular point in Layout implementing Function any more, now that we're in a Java 8 world. The one place I can think of where it's at all relevant is that it makes it marginally easier to have a Layout whose initial positions depend on another's, and even that capability wouldn't go away, we'd just either need to use a Layout::getPosition reference or provide an overload of setInitializer that took a Layout. Am I missing anything?)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jrtom/jung/issues/104#issuecomment-329857836, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGbB4alGE1J7ZOg_CdLxbCgtDidRU5Oks5sir2TgaJpZM4Ouyqr .

jbduncan commented 6 years ago

Hi @jrtom, as it's not terribly obvious to me which parts of your last comment were directed at who, I'll do my best to answer the part which I think you meant for me. If I've missed anything, please let me know. :)

I'm certainly open to using builders for creating layouts, and that would make such a change easier in future. But I'd like to hear more about the anticipated benefits that providing this flexibility would provide.

I admit that I was just throwing vague, potentially naive thoughts to the wind, in the hope that it would sprout some fruitful ideas. So I haven't given much thought about the benefits and drawbacks that a Builder would bring.

But as to what I have thought so far, here is what I'd imagine a Layout usage to look like (assuming it operates on Graphs as opposed to Networks or ValueGraphs):

Graph<V> graph = ...;
Dimension dimension = ...;
Layout<V> layout =
  Layout.builder()
    .graph(graph)
    .initializerFunction((V v, Point2D point) -> ...)
    .size(dimension)
    .resetRunnable(...)
    .locationFunction(...)
    .initializeRunnable(...)
    .build();

However, having written this idea up as code, I can see a few drawbacks: 1) The Builder is verbose. 2) It requires users to specify implementation details of layouts. 3) The function-accepting methods cannot accept functions which throw checked exceptions.

jbduncan commented 6 years ago

@tomnelson

As to the replacement of java.awt.Dimension and java.awt.geom.Point2D, We seem to be leaning towards writing new classes to add to jung. My preference is to use something 'good' that someone else wrote, unless it is kind of obscure. We went down that path with collections-generic way back when there we no better alternatives. Instead of writing our own Dimension and Point, we could use the Point2i and Point2d classes from the vecmath package. vecmath seems like a reasonable dependency (to me) and the 2 classes I mentioned seem to have equivalent features to those in the awt classes, without the awt baggage.

The problem I see with Point2d at least (which I found a javadoc for here) is that its x and y fields are mutable just like AWT's Point2D. Whereas it does indeed solve the AWT dependency issue, it doesn't prevent users from changing x and y fields of Point2d objects they pass to JUNG, unless JUNG defensively copies them.

IMO, immutable tuples would be better suited here.

tomnelson commented 6 years ago

@jbduncan That sounds reasonable to me, although I believe that we use the mutability in our Layout classes (or maybe just in the visualization code) to move stuff around without making and storing new objects.

jbduncan commented 6 years ago

@tomnelson Oh, I hadn't considered that.

I personally suggest we look for a way to make use of immutable tuples, nonetheless. But if we come to the conclusion that it's impractical to remove the mutability because it objectively worsens performance or it is just physically impossible, then I'll happily concede. :)

tomnelson commented 6 years ago

FWIW, I believe that the Javafx equivalent of Point is immutable. That is one of the few things I noticed when I glanced at how Javafx would interact with our Layout classes (that use mutable Point2D).

I like the idea of them being immutable in our algorithms module, and not letting visualization dictate how our Layouts are designed.

(I'm not suggesting using the javafx Point in Layout!)

jrtom commented 6 years ago

@jbduncan, I'm responding to your comment first because it's easiest. :)

The short version is: I should not have put those two sentences in the same paragraph, because I did not intend to conceptually link them. :) The "flexibility" that I was referring to was the notion of separately specifying the storage for the nodes. I'm pretty much on board with the notion of using builders for constructing most things, as I think they make the code more self-documenting (and eliminate the combinatorial explosion of constructors).

The major question to my mind in re: Layout builders would be whether we should have a single entry point for all builders, e.g.:

Layout<N> layout = Layout.forGraph(graph)
  .fruchtermanReingold() // or algorithm(FRUCHTERMAN_REINGOLD [1])
  .size(dimension)
  .build();

or an entry point for each algorithm:

Layout<N> layout = FRLayout.forGraph(graph).size(dimension).build();

Option (1) advantages:

provides a single point of entry, which makes it easy to use and document
maximizes discoverability of algorithm options disadvantages:

Option (2) advantages:

provides more flexibility; not all options make sense for all builders
- counterpoint: we could at least have the Option 1 algorithm() method return an algorithm-specific builder type, so errors could be caught at compile time
users can copy this pattern if they're creating their own Layout impl
- counterpoint: I suspect that very few users create their own Layout implementations.

[1] We might want to consider renaming our algorithms according to how they work, not after the names of the authors of the algorithm. Most people probably have no idea which one is which based on the existing names.

jbduncan commented 6 years ago

@jrtom Option (1) tempts me the most, because of the reasons you explained above and because of the fact that we'd be using an interface static method for Layout.forGraph(graph), which sounds very cool to me. :wink:

I'm very interested by your idea in Option (1) to allow the selected algorithm to affect the type of the returned builder. Do you have any thoughts yet on what the implementation would look like, especially for user-provided algorithms?

jrtom commented 6 years ago

Responding to the notion of using someone else's library for analogues to Dimension/Point2D:

(1) As a general statement, I would prefer to keep the number of dependencies that JUNG has to a minimum. It simplifies licensing weirdness (which we've had problems with in the past) and just generally reduces coordination costs. While I don't regret using collections-generic libraries, exactly, it definitely entailed a lot of cleanup (which we arguably still haven't finished, in the sense that a lot of our names (e.g. Transformer) still reflect the old libraries.)

In this case the surface area for these types inside our library would probably be fairly limited, i.e., they'd be used by the Layout types and by some part of the visualization stuff (exactly how much would depend on when we did the conversion).

(2) Arguably what we really want for these things are value types, which Java doesn't yet support. In lieu of those, I'd probably prefer to take a dependency on @AutoValue (for these and for other types, as discussed in #94 ) so that we can at least minimize the boilerplate code (and bugs), and then migrate to using value types later, perhaps.

I agree that these types should be immutable, and +100 on not letting the visualization model dictate our algorithm class designs.

jrtom commented 6 years ago

@jbduncan re: option (1), my basic idea is to have a top-level Builder class that you use to get the basic instance for the algorithm and then have specialized Builder classes that provide options that are specific to the algorithm, or at least the kind of algorithm.

Guava's MultimapBuilder is an example of this kind of pattern.

Whether we would have a separate Builder type for each algorithm, or have kinds, perhaps even a class hierarchy, of builders (e.g., static vs. dynamic) is an open question.

I wouldn't necessarily insist on using an interface static method, although we certainly could. (I'm so used to the idea that I can't use most of the cool Java 8 stuff in developing for Guava because it has to be backwards compatible that my reflex is "we can't use that feature", so by all means call out this sort of option where it makes sense; I was mostly using a shorthand. :)

jrtom commented 6 years ago

Responding to @tomnelson :

I'm starting with the notion that Layout is not part of the jung-visualization module. Maybe there are cases where someone wants to make a graph, apply a layout algorithm, and ask what nodes are within some distance from some point. I think that would work for extremely large graphs that have locations stored on disk. Whatever the possible applications are, we did not put Layout in jung-visualization, so let's not let visualization challenges limit the design of our Layouts.

I will point out that right now we can't efficiently support this, either, because we don't make use of spatial data structures that would make this feasible: in order to answer this question right now in JUNG, one has to iterate over all of the nodes and calculate their respective distances from the specified point.

Also, this representation would also implicitly presume that either location updates are infrequent or that we've got a pretty hefty back end that can handle high-frequency updates, such as what the force-directed layouts do.

So possibly this is something that we should only support for a static layout (not necessarily, just putting it out there). And in fact for static layouts, this makes sense: given StaticLayout, maybe we shouldn't be providing CircleLayout and TreeLayout and other deterministic layouts at all as Layout instances, but just as algorithms that assign positions that are used as the input to a StaticLayout. This is distinct from dynamic non-deterministic layouts (such as the various force-directed layouts).

Another more radical angle to consider: maybe we should take this even farther and not support automated (algorithm-driven) updates to a layout while it's being rendered. At that point the rendering code actually wouldn't need a Layout at all, just a function that maps nodes to positions. We'd sacrifice the pretty animations while the force-directed layouts did their thing, but it might make the visualization code easier; @tomnelson can probably comment on that better than I can without spending some serious time digging in the code.

tomnelson commented 6 years ago

I agree that spatial data structures would be a very good thing in some jung-algorithms Layout implementations. Is it enough of a priority to go before some of the other work being done?

For our currently single-threaded layout implementations, it will be interesting to see the performance difference between calling setLocation on every node's mutable Point in every 'step' of the algorithm compared with removing a Point from the locations Map, making a new Point, then storing it in the Map, along with the gc of all those abandoned Points.

jrtom commented 6 years ago

@tomnelson We've gotten along (slowly) without spatial data structures for this long, we can continue to do so for as long as we need to; it's a question of where we want to put our energies.
My main point about those is that if we don't have them, then it's not obvious that there's much benefit to storing node locations other than in-memory, because we can't feasibly support the use case you described (finding all nodes within radius R of a specific point) without them for large graphs that are stored in persistent storage. (Perhaps there are other use cases that we should also consider that wouldn't require spatial data structures in order to be feasible; feel free to bring some up.)

Side note: Layout itself is not what supports lookups like that; that's what types like RadiusNetworkElementAccessor are for. Certainly some of the force-directed Layout implementations would benefit from spatial data structures to do their own updates.

You make a good point about mutable data structures in the context of updates; it is almost certainly faster to update those objects than it is to create new ones. So perhaps focusing on immutability is not the best option here.

I'd be interested to hear your thoughts (and @jbduncan 's) on some of the more out-there suggestions I made yesterday.

tomnelson commented 6 years ago

I think that your statements about the static vs dynamic layouts comes down to whether we have them implementing IterativeContext or not. I see no reason to change that. It would be a shame to lose the pretty animations.

I'm not that concerned right now about making the visualization code easier. I mostly just want to know who's got the Graph. Speaking of which....Wow, there sure is a lot of duplication in the Graph and Network interfaces. Wouldn't it be great if there were some common interface? Our Layouts could work against that interface and then be more flexible. That would make it possible for the Netwraph (like the name?) to be gettable from the Layout, too.

I'm thinking of writing a (naive) version of something like SpringLayout that uses spatial data. That way, you will have to stop saying 'since we don't have spatial data structures' and can only say 'since we don't have any GOOD spatial data structures'. I would like to have a starting point to improve on. Maybe some design ideas will emerge.

jrtom commented 6 years ago

@tomnelson I'm not particularly advocating for getting rid of the capability to update visualizations while the layout is updating, I'm just trying to [encourage us to] explore the design space a bit more. :)

Regarding "who's got the graph", IMO the answer should be that the visualization model has the graph; that seems like the right place to put it. This implies that if the visualization model's graph is replaced, it's the model's job to create a new layout for the new graph; it also implies that the model must listen for updates to the graph, and dispatch updates to the layout (and the rendering piece) as appropriate.

The one fly in the ointment with that plan is that you want some way to be able to enable the scenario in which the user wants to visualize a new graph, but because the new graph has the same node set as the old one [or a subset], they want to use the same node positions. So we should think about how that should work.

Yes, there's a lot of duplication in the Graph and Network interfaces; it's been a long and winding road on the common.graph type hierarchy, but I've got a proposal I'm working on (for common.graph) that may resolve that issue, hopefully without having much of an effect on existing code. (That said, even if that happens, I still don't want the Layout to provide a graph instance, even if in some cases it needs to retain one.)

I would be happy to have someone provide an implementation of NetworkElementAccessor that has some kind of non-trivial spatial data structure. Go for it. :)

tomnelson commented 6 years ago

I'm saddened that we use the word 'Layout' instead of calling it 'TheThingThatHoldsTheGraphAndItsNodeLocationsAndCanComputeThemWithAnAlgorithm' because if we named it the latter, there would be less of a problem with getting the graph-like-thing from it.

I really don't like having the graph-like-thing stateful in 2 (or more!) places and requiring event dispatch to keep them in sync,

Oh, you want a Non trivial implementation..... Sorry, I'm starting easy! I have to go back to old versions where the visualization still works to enjoy testing it though.

jrtom commented 6 years ago

Something has to hold the graph instance for visualization purposes. I agree that, ideally, there would be only one such instance so that we don't have to worry about keeping the instances in sync.

So let's walk through this a bit.

The input requirements of layouts (in terms of what they need to know about the graph topology) differ: (1) StaticLayout and CircleLayout need no topology information at all and require only a set of nodes. (2) The tree layout algorithms require a root node and a SuccessorsFunction (it's not written like that now but it should be for maximal flexibility).

random note: there is currently no Forest type, which means that there is no native way to represent a collection of trees. (3) Some layouts need a graph instance, i.e., they need the complete topology (but don't need any edge information). (4) Some layouts need a graph instance plus a way to associate values with edges (this need not be a ValueGraph although that's an obvious option).

(Side note: I've wanted for years to have a better initialization state for the force-directed layouts than "random positions"; a relatively simple option would be to start with a node, traverse the graph, and place nodes at random positions that are a function of the positions of already-placed nodes that they're connected to. It should help force-directed layouts converge a lot faster.)

There is no (and will be no) single interface that all of these things implement. A ValueGraph would suffice for all cases but would be overkill for most of them; we wouldn't use that.

The visualization system, by contrast, presumes that it has a graph and can at least iterate over the node and edge sets, and recover the endpoints for the edges. (I don't recall a context at the moment in which the visualization system needs to know anything further about the topology.)

Note that some of this information (which node is the root? how are values associated with edges?) are not relevant to the visualization system.

Kinds of synchronization issues that we have to worry about:

The graph object itself changes.
The elements of the graph change: either a. the edges change but the nodes stay the same b. the node and edge sets change

I'd have to go digging through the visualization code, but I believe that currently, when paint() is called, everything gets re-rendered, even if only a small part of the graph has changed. I would imagine that if we had finer-grained information about such changes that we could make repainting way faster, but maybe I'm just terminally confused.

(Note that Layout doesn't have anything native that causes it to listen to changes in the graph, nor does it provide a way for listeners to be notified if the graph changes (JUNG 2.0 Layout.setGraph() is a thing).)

[More on this in a bit; I'm about to start my commute and switch computers.]

jrtom commented 6 years ago

We already have a trivial implementation of a spatial data structure, it's the one where you have to iterate through all the nodes and get their locations from the layout. :) A slightly less trivial version would be to divide the layout canvas into a grid, and then cause each grid section to know what nodes were in it (each node could "know" what section it was in via an appropriate transformation of its coordinates). At that point if you want to know what's within radius X of a point, you get its grid, figure out what grids comprise the area defined by the radius, and iterate over those grids to find out what's in them; you can even do so in an appropriate pattern if you're looking for the closest element.

(This is basically something like a single-level quadtree, I think.)

tomnelson commented 6 years ago

That's a lot to respond to. Here is my summary: The thing we call Layout needs to get and set node locations and it needs an algorithm to determine where things start out and where they end up.

I made a grid-based spatial data structure today, very much like you described.

jrtom commented 6 years ago

Agreed on the basic requirements of a Layout. In some cases the algorithm is trivial (StaticLayout) and in some cases the output is chaotic (all the force-directed IterativeContext layouts) but they all have those requirements. I note that "supplying a graph instance" is not one of those requirements. (Doesn't mean that it must not, but I think we agree that it's not an intrinsic part of being a layout.)

Going back to my mini-lecture...

Layouts and visualizations have different requirements for what kinds of information they need about the graph on which they're operating, although in some cases they do overlap.

Once you've figured out the positions for each node, the graph topology (and the algorithm by which the positions are determined) become irrelevant. I think this may be a key insight.

Put a different way: a layout algorithm is a function that takes node positions (which may be either explicitly specified or unspecified) and some information about the graph, and generates a new set of node positions.

In this formulation:

an iterative layout algorithm is just a process that applies a layout algorithm until a specified predicate (some measure of convergence, or a specified number of steps) is satisfied. Note that we only really need one class to perform this iteration.
the graph is only needed as long as you're applying the layout algorithm.

So at this point we can set it up like this:

Visualization system
- must have access to
- a Function<N, Location> that maps nodes to locations
- the graph (so that it can iterate over nodes and edges to render it)
- if it wants to update the node positions, then it needs a layout algorithm, but not all visualizations require this (note that one example of a layout algorithm, in this formulation, is just a trivial update of one node position in response to user input)
Layout algorithms could become (static) methods that look like this:

// this one needs input locations and a graph
Function<N, Location> nodeRepulsion(Function<N, Location> nodeLocations, Graph<N> graph)
// this one needs only a set of nodes
Function<N, Location> circleLayout(Set<N> nodes)
// this one needs only a root and a way of getting the successors (children) of a node
Function<N, Location> treeLayout)(N root, SuccessorsFunction<N> successorsFunction)

Note that force-directed algorithms that have separate alternating steps (e.g. node repulsion and edge length adjustment) can actually be made into separate algorithms and composed.

One thing that the above design is lacking is a unified interface for layout algorithms, so that the ones that need iteration can use a single interface, but I haven't figured one out, or even determined that one necessarily must exist. For now, we'll presume a GraphLayoutAlgorithm functional interface that will cover at least some of those algorithms:

@FunctionalInterface
interface GraphLayoutAlgorithm<N> {
  Function<N, Location> apply(Function<N, Location> locations, Graph<N> graph);
}

The iterative wrapper for such an algorithm might then look something like this:

class GraphLayoutIterator<N, E> {
  // these are initialized by a Builder that must specify at least one or the other
  int maxIterations;
  Predicate<Function<N, Location>> convergencePredicate;

  public static Function<N, Location> iterate(Graph<N>graph, GraphLayoutAlgorithm<N> algorithm) {
    // iterate until we've hit max iterations or the convergence predicate is satisfied
    // return the final node locations
  }
}

(Note that a similar thing can happen to things like PageRank, for that matter; the defining thing about PageRank is the update step, not the convergence check.)

At this point, I think we've got a design where the layout algorithms don't need to maintain a reference to a graph, because they are given that information as they need it; they're basically stateless. And that means that the visualization system--specifically, the VisualizationModel--can hang onto the one and only reference to the graph, and supply it (if needed) to the layout algorithm.

(I really need to start creating design docs for these discussions, doing it in the comments on an issue is kind of terrible. :P )

Anyway, I'd be very interested in getting reactions to this proposal, but so far I think I like it; it seems to make things simpler overall, by making the individual components simpler and composable.

jrtom commented 6 years ago

Awesome to hear about the grid-based spatial data structure--thanks!
BTW, where, architecturally speaking, did you put it?

tomnelson commented 6 years ago

I need to read this about 5 more times before I can give a better answer. For now, here is my gut reaction:

We have this thing that is like a mediator between positional storage, an algorithm, and a graph. I believe that if we conflate the idea of the 'mediator' thing with the 'algorithm', then our need for purity in the way we define items from the study of graph theory cause us to have thoughts like: 'Why should FRLayout be a source of the graph it uses?' If we just think of this thing as a place where all 3 of those meet, then there is no shame in using it as a vehicle for all of them.

For example, in some imaginary rendering toolkit, there could be a graphics context like thing that carries around the colors, pen-styles, fonts, etc to draw stuff. I could say something like: 'It is not the responsibility of the graphics context to tell you what color it is using internally. If you want to know the color, you need to store the color separately and do your own thing to keep it in sync with what the graphics context is using".

Please design Layout the way that feels best from a Graph Theory perspective. The visualization code will have to adapt to that, probably by making a new class that is a mediator between 'Layout' and Graph. :(

I will continue to play with spatial data structures. What I have now is better suited to detecting collisions in an animation than for graph layout, but it is a start. I'll push a branch with it when I have something that is not too embarrassing. It will likely be a pre-common.graph branch so we can 'see' it work.

tomnelson commented 6 years ago

Your example: class GraphLayoutIterator<N, E> { // these are initialized by a Builder that must specify at least one or the other int maxIterations; Predicate<Function<N, Location>> convergencePredicate;

public static Function<N, Location> iterate(Graphgraph, GraphLayoutAlgorithm algorithm) { // iterate until we've hit max iterations or the convergence predicate is satisfied // return the final node locations } }

// iterate until we've hit max iterations or the convergence predicate is satisfied The convergencePredicate and maxIterations had better be static.... Then it will be interesting to watch when different code in the same classloader calls this method after hijacking each other's maxIterations or convergencePredicate!

jrtom commented 6 years ago

re: static method of GraphLayoutIterator: the 'static' was left over from a previous iteration that I failed to clean up before hitting "comment". The intent was that the GraphLayoutIterator.Builder would create an instance of GraphLayoutIterator, which would have the specified instance method 'iterate'. (Those names could definitely be improved.)

jrtom / jung

Refactor algorithms package #104