twitter / cassovary

Cassovary is a simple big graph processing library for the JVM
http://twitter.com/cassovary
Apache License 2.0
1.05k stars 150 forks source link

Array Based Dynamic Graph #135

Closed plofgren closed 9 years ago

plofgren commented 9 years ago

I've implemented a dynamic directed graph, using an ArrayBuffer of IntArrayList (from the fastutil library). If n nodes are used, O(n) objects are created, independent of the number of edges.

Comparison to the current class: Note that the current dynamic graph class SynchronizedDynamicGraph uses a ConcurrentHashMap to store nodes, which has more overhead than an ArrayBuffer when the nodes are mostly sequential. Also, each node stores an ArrayBuffer[Int], which I belive boxes the ints into objects, creating overhead relative to fastutil's IntArrayList. It also has non-trivial synchronization overhead; in particular when iterating over the neighbors of a node, the neighbors are first copied into a new array, then an iterator to the new array is returned. The current class is better when the node Ids are very non-sequential or when automatic synchronization is needed, but otherwise I believe the new class is more efficient. In a very informal comparison on a graph with several hundred million edges, the currrent class wouldn't load using 30GB of heap, while the new class fit the graph in 7.3 GB of RAM.

@pankajgupta Could you take a look at this, or point me to someone else?

Thanks!

pankajgupta commented 9 years ago

Consider adding benchmark in cassovary-benchmark subproject as well to demo the performance of this.

plofgren commented 9 years ago

I probably won't get to the benchmark for a few days, but I'll post once I have it, and I'll make a call then about multi-threading.

pankajgupta commented 9 years ago

ok regarding benchmark

plofgren commented 9 years ago

I needed a mutable undirected graph, so I just added code to support undirected graphs (i.e. stored direction Mutual). @pankajgupta @szymonm Please comment on the new commit when you get a chance.

plofgren commented 9 years ago

I actually didn't realize until today that "Mutual" meant undirected and was starting to write my own UndirectedGraph wrapper class when I noticed it. Should I add a comment to StoredGraphDir along the lines of Mutual, // In a mutual graph, the outbound and inbound neighbors of each node are equal, and space is typically saved by only storing the outbound neighbors of each node ?

pankajgupta commented 9 years ago

LGTM. Yes, Mutual has that intention, but it will be better for there to be a barebones UndirectedGraphWrapper class for readability and to avoid the surprise you ran into. So let's close this PR and please feel free to take up that into a new PR.