vtraag / leidenalg

Implementation of the Leiden algorithm for various quality functions to be used with igraph in Python.
GNU General Public License v3.0
566 stars 76 forks source link

Can't Finished running on million nodes and one hundred million edges data #162

Closed NYcleaner closed 6 months ago

NYcleaner commented 6 months ago

First , thanks for provide such a good graph calculation library。 Now I face a problem. When I use data of tens of thousands of nodes and millions of edges, Leiden's algorithm can complete community division in a few minutes on spark cluster (--spark.session.driverMemory=10g --spark.session.driverCores=1 --spark.session.executorCores=8 --spark.session.executorMemory=8G). But when the data scale reaches one million nodes and 100 million edges, the algorithm will take 2 hours or more time. So, for a large amount of data, is there any optimization method that can guide? Looking forward to reply, thank you

vtraag commented 6 months ago

Thanks! Indeed, larger graphs may obviously take more time. The Leiden algorithm is one of the fastest algorithms available, so it won't be easy to find alternative. However, you might be interested in using the implementation in igraph itself: https://python.igraph.org/en/stable/api/igraph.Graph.html#community_leiden. That has more limited capabilities, but should be faster.