singnet / offernet

Offer Networks simulation framework written in Java/Groovy/Gremlin
https://singnet.github.io/offernet/public/offernet-documentation/index.html
MIT License
8 stars 5 forks source link

efficient calculation of graph geodesics metrics (with VertexProgram) #63

Open kabirkbr opened 6 years ago

kabirkbr commented 6 years ago

Certain graph geodesics metrics, especially graph diameter changes due to processes running on the graph would be informative for design of further experiments; also for better understanding of graph structures we are working with. Unfortunately, calculation of these metrics on bigger than small graphs seems prohibitively expensive, at least with the readily available means and algorithms. E.g. the time complexity of graph diameter algorithm is image...

I have tried to implement these with A) networkX in python and B) Gremlin on DSE Graph, where simulations are running anyway; B) is painfully slow, partly due to quick and non-efficient implementation, partly because... graph geodesics is expensive. A) is faster, but not enough to be practical...

B:

2018-10-21 17:36:38,315 [Test worker] DEBUG OfferNet.class  : Graph diameter: 9; calculation time: 1.504s (Graph size == 50 agents).
2018-10-21 17:56:35,158 [Test worker] DEBUG OfferNet.class  : Graph diameter: 5; calculation time: 56.759s (Graph size == 100 agents).
2018-10-21 18:18:37,843 [Test worker] DEBUG OfferNet.class  : Graph diameter: 6; calculation time: 843.498s (Graph size == 200 agents).

A:

python analysis/scripts/python/graph_diameter.py ~/offernet/analysis/experimentData/EXP10-17-09-54-A7Z97s/SIM10-17-11-55-gwhEL4--DV/graph.graphml 
('Number of all nodes: ', 3216)
('Diameter of the whole graph: ', '7')
Time of calculation: 
0:13:16.519228
('Number of agent nodes: ', 801)
('Diameter of agent-> knows-> agent graph: ', '8')
Time of calculation: 
0:00:05.891267

Computing graph geodesics of large networks is a known and complicated problem, which is solvable by applying 'tricks': Rodriguez, M. A. (2017, April 10). Reducing Computational Complexity with Correlate Traversals. Retrieved October 21, 2018, from https://www.datastax.com/2017/04/reducing-computational-complexity-with-correlate-traversals

Additionally to all above, we are dealing with the multi-labelled graphs, for which the usual graph geodesics algorithms (and measures) are not immediately suitable: Rodriguez, M. A., & Shinavier, J. (2010). Exposing Multi-Relational Networks to Single-Relational Network Analysis Algorithms. Journal of Informetrics, 4(1), 29–41. https://doi.org/10.1016/j.joi.2009.06.004

However it seems that there is a way to calculate at graph diameter quite efficiently using vertex centric algorithms, as explained here: Pennycuff, C., & Weninger, T. (2015). Fast , Exact Graph Diameter Computation with Vertex Programming. Presented at the 1st High Performance Graph Mining workshop. Retrieved from https://www3.nd.edu/~tweninge/pubs/PW_HPGM.pdf

In order to do that we need to implement a VertexProgram in TinkerPop3 and run it with BSP graph computer, which magically optimizes traversal complexity by pruning duplicate traversals. Yet, making it work seems to be quite involved. From the first half-a-day-worth trial I was not able to configure and run Spark on DSE grpah, which is needed in order to run traversals with graph computer. It may be possible to run them on either TinkerGraph or plain Gremlin server, but configuring DSE for OLAP traversals would be nice.