neo4j-contrib / neo4j-mazerunner

Mazerunner extends a Neo4j graph database to run scheduled big data graph compute algorithms at scale with HDFS and Apache Spark.
Apache License 2.0
381 stars 105 forks source link

Betweenness Centrality Tractability? #44

Open cookieups opened 8 years ago

cookieups commented 8 years ago

Running this algo on a directed graph with 30K nodes and 50K Edges is extremely slow. The output during betweennessCentralityMap phase is roughly 1 vertex completed per 3 minutes. I realize that this can be a O(n^3) scale problem depending on algorithm, but this seems prohibitively slow. Running this on a single VM with 10 GB of Memory and 3 cores. Is this behavior normal?

kbastani commented 8 years ago

I agree, this could certainly be improved. I will take a deeper look and see what I can do to improve it.

cookieups commented 8 years ago

Other betweenness centrality programs I've seen that are somewhat efficient have implemented the Brandes "Faster" Algorithm (usually single threaded). On the same set above, it completes in roughly 3 minutes (250 Edges per second), and on a larger set (120K V, 9 MM E) roughly 5 hours (500 Edges per Second) for both Vertex Betweenness Centrality and Edge Betweenness Centrality:

http://algo.uni-konstanz.de/publications/b-fabc-01.pdf