vtraag / leidenalg

Implementation of the Leiden algorithm for various quality functions to be used with igraph in Python.
GNU General Public License v3.0
606 stars 78 forks source link

Well-connected nodes #104

Closed mongo1979 closed 2 years ago

mongo1979 commented 2 years ago

Hi,

Can you please explain how to find the well-connected nodes when the algorithm does "MergeNodesSubset" in the Leiden algorithm paper ? The equation (line 34, Algorithm A.2) in the appendix A of the paper has some symbols, E(v, S-v), r, ||S||, and ||v||. Can you please explain these symbols?

From the paper, what I understand is they are defined as E(v, S-v) : the number of edges connects v and the rest of node in S (S-v). r : resolution ||S|| : recursive size ||v|| : degree of node v (or kv).

However, I read the following paper.
Leiden-Based Parallel Community Detection It explains them as E(v, S-v): the weighted cut of all edges between v and (S-v) ||S|| : the weighted volume of a set of nodes, or the sum of the nodes' weighted degrees ||v|| : the sum of the weight of all incident edges

It seems the definitions are different. Besides, the r (resolution) in the line 34, Algorithm A.2 has been rescaled by 2m or not? If it has been rescale, what is m? It the number of edge in the sebset S or the number of edge in the entire graph, G? Thank you for your explanation.

vtraag commented 2 years ago

The two definitions from our paper and the thesis you mention seem to be congruent to me.

The symbols in line 32, Algorithm A.2 are defined earlier in the paper. See just after Eq. (A4) for the definition of $E(C, D)$. See definition 1 for the recursive size $\Vert S \Vert$. It simply measures the total number of nodes in a community, independent of whether that is measured in an aggregate graph or in the original graph. A single node $v$ in line 34 may refer to a node in an aggregated graph, so that $v = \{v_1, v_2, \ldots\}$, meaning it essentially comprises multiple nodes in the original graph, hence the reason it also uses this recursive size. The thesis seems to be working only on modularity, in which case we set the node size is equal to the degree of the node, in which case the definitions are congruent. They are not in general however, so some of the results of the thesis may only hold for modularity, and not for CPM (but I have not read the thesis in sufficient detail).

In general, the routine for the refinement can be set to use move_nodes or merge_nodes, see https://leidenalg.readthedocs.io/en/latest/reference.html#leidenalg.Optimiser.refine_routine for more details. Do note that the implementation of merge_nodes does not include checks for well-connected communities / nodes, since this is dependent on the exact quality function used. This is also noted in the documentation here: https://leidenalg.readthedocs.io/en/latest/reference.html#optimiser with some additional explanation.