Can't Finished running on million nodes and one hundred million edges data

vtraag / leidenalg

Implementation of the Leiden algorithm for various quality functions to be used with igraph in Python.

GNU General Public License v3.0

566 stars 76 forks source link

First ， thanks for provide such a good graph calculation library。 Now I face a problem. When I use data of tens of thousands of nodes and millions of edges, Leiden's algorithm can complete community division in a few minutes on spark cluster (--spark.session.driverMemory=10g --spark.session.driverCores=1 --spark.session.executorCores=8 --spark.session.executorMemory=8G). But when the data scale reaches one million nodes and 100 million edges, the algorithm will take 2 hours or more time. So, for a large amount of data, is there any optimization method that can guide? Looking forward to reply, thank you

vtraag / leidenalg

Can't Finished running on million nodes and one hundred million edges data #162