mapequation / infomap

Multi-level network clustering based on the Map Equation
https://mapequation.org/infomap
GNU General Public License v3.0
425 stars 88 forks source link

Why infomap uses only one core? #332

Closed Ishitori closed 1 year ago

Ishitori commented 1 year ago

Hi team,

I have a 200M node graph and Ubuntu with multiple cores. It takes about 2.5 hours to load the graph, but to my surprise things are still slow even when running im.run() - according to htop only one core of my machine is used... I thought infomap is written with OpenMP support... I installed using regular pip install.

What should I do to run infomap in parallel?

antoneri commented 1 year ago

Hi!

  1. What is the output of infomap --version? If it doesn't say that it is compiled with OpenMP support, you might be missing the libomp-dev package.

  2. How big is the network in terms of file size? Is it an ordinary or multilayer network? Does it include node names? If you are adding the network in Python by looping over the links, this is significantly slower than reading a file directly with im.read_file("network.net").

  3. How much memory does your machine have? If you're running out of memory, you might be using swap instead of ram, which will be much slower.

Ishitori commented 1 year ago
  1. The output of infomap --version is: Infomap version 2.6.1 compiled with OpenMP So, it looks like it has OpenMP support

  2. My network is about 200M nodes and 1B edges. It is ordinary, directed, weighted network. It doesn't include node names, all nodes are encoded with a consecutive integers from 0 to N. It takes about 13 minutes to add all the links from parquet files to infomap - slow but not terrible.

  3. I am running it on a huge machine with 512 gigs of RAM. According to top the process uses 63.3% of RAM only.

It is running for the 4th day in a row as of now. I was thinking that maybe it is doing some single-thread preprocessing before starting working at full speed, but no, I don't see a switch to multi-core usage.

Any recommendations?

antoneri commented 1 year ago

Alright!

Yes, Infomap runs the PageRank algorithm single-threaded, as this step is usually significantly faster than the optimization step, where we use OpenMP.

Sounds like you have done everything right, so my suggestions is to check out these alternative Infomap implementations: Scalable and Efficient Flow-Based Community Detection for Large-Scale Graph Analysis Multi-level Graph Drawing using Infomap Clustering

I'm also converting this to a discussion for now!