Weighted CPM clustering takes much longer when scale of weights is higher

vtraag / leidenalg

Implementation of the Leiden algorithm for various quality functions to be used with igraph in Python.

GNU General Public License v3.0

566 stars 76 forks source link

Hi,

I am using Leiden CPM to cluster a weighted, directed network with ~14 million nodes and ~52 million edges. Initially, the scale of weights was between [0,0.5], and the clustering was as fast as 10 minutes. Now I decided to change the scale of weights and map them to the [1, 10] interval to see the effect.

For some combinations of weights, the code does not finish. I let it run for 7+ hours, but it never finished, while for some combination of weights (as well as the unweighted case) it finishes in 10 minutes.

        kwargs = {'resolution_parameter': resolution_parameter}
        part = la.find_partition(iGraph, la.CPMVertexPartition, seed= seed, weights=pandas_df['weight'], **kwargs)

While still running, following the top command, I can see there is a python process using +90% of a CPU and 20 GB of memory.

Thanks.

Note that a scaling of the weights will affect the resulting partitions that is similar to scaling the resolution parameter. That is, if you scale the weights by some factor $c$, you should also scale the resolution parameter by the same factor $c$ in order to get the same results (at least for CPM). So, whether you are scaling the weights up (multiplying by $c$) or scaling the resolution (dividing by $c$) will amount to the same thing. In this case, you are multiplying by quite a lot, so you will probably get much coarser clusters if you keep the same resolution parameter. Additionally, you are adding 1 to the weight, for which I don't completely understand the reason.

In short: the resulting partition will be different, and it might be much more difficult for the algorithm to converge to a sensible partition. Hope this makes sense!

vtraag / leidenalg

Weighted CPM clustering takes much longer when scale of weights is higher #170