snap-stanford / snap

Stanford Network Analysis Platform (SNAP) is a general purpose network analysis and graph mining library.
Other
2.16k stars 795 forks source link

Speeding up BigClam implementation #210

Open liuchbryan opened 3 years ago

liuchbryan commented 3 years ago

This change speeds up the C++ implementation for BigClam (Yang & Leskovec, 2013). It parallelises the computation in the stage where one reads off the community affiliation from the factorised affiliation weights matrix, which currently dominates the runtime and was yet unparallelised.

Detailed context and running instructions are available on contrib/ICL-bigclam_speedup/README.md

The actual changes to the main SNAP library is only a few lines - in snap-adv/agmfast.cpp. The bulk of the work is to

  1. Measure the speed up achieved by parallelising the operation, and
  2. Verify the sped up version produces the same set of communities as the original version.

The latter requires a custom utility contrib/ICL-bigclam_speedup/src/cmtycompare.cpp as the community affiliations are stored in SNAP (C++) as vectors of vectors instead of sets of sets.

To prevent interference with the main SNAP library, we have copied out the header and cpp files in examples/bigclam and the original snap-adv/agmfast.cpp. These files can be found in contrib/ICL-bigclam_speedup/src/. We also use a separate Makefile largely similar to that in the main project when we build our executables.

Please let me know if you have any questions!

Bryan

P.S.: Apologies for sending in the pull request late - I know Node2Vec or GNNs are now the dominating tools, though given BigClam's accumulated significance I will be doing it a disservice for not reducing the runtime when I can!