vtraag / leidenalg

Implementation of the Leiden algorithm for various quality functions to be used with igraph in Python.
GNU General Public License v3.0
575 stars 77 forks source link

node_sizes parameter is giving weird results #95

Closed kalthwaini closed 2 years ago

kalthwaini commented 2 years ago

Im examinating couple of the leidenalg algorithms, yet i really didn't get the role of the node_sizes parameter. What interesting here is the noticeable impact of the parameter. Please support by explaining how the node_sizes is implemented.

vtraag commented 2 years ago

Hi @kalthwaini, the node_size argument to CPMVertexPartition controls the size of the node, see also https://leidenalg.readthedocs.io/en/stable/reference.html#cpmvertexpartition.

CPM takes into account the overall size of a community. In terms of a sum over all communities, this is

sum_c m_c - gamma * n_c ( n_ c - 1) /2

where m_c is the number of edges within community c, gamma the resolution parameter and n_c the size of community c. Normally, the size of the community is simply the total number of nodes in a community c. Indeed, setting the individual node size to 1, which is the default, we exactly count the total number of nodes in a community c. If the graph itself consists of an aggregation of nodes already (e.g. nodes represent organisations each of which contains a number of individuals), you might want to pass as the node size the aggregate node size (e.g. the total number of individual for each organisation).

vtraag commented 2 years ago

I've slightly improved the documentation to clarify this further. Moreover, it seems that node_sizes was added to ModularityVertexPartition and RBConfigurationVertexPartition in order to solve a problem previously noted at #60. These argument are redundant however, and so the current fix in 4e108cd7a9acf530930106d7212a7f576eba0516 is better.