vtraag / leidenalg

Implementation of the Leiden algorithm for various quality functions to be used with igraph in Python.
GNU General Public License v3.0
596 stars 78 forks source link

When to use merge_nodes_constrained vs. move_nodes_constrained #61

Closed AvantiShri closed 3 years ago

AvantiShri commented 3 years ago

I have an initial community assignment that I'd like to further subcluster using modularity optimization - analogous to the "refinement" step in Leiden. I noticed there were two functions to support this: optimiser.move_nodes_constrained and optimiser.merge_nodes_constrained. The documentation for merge_nodes says "Merging in this case implies that a node will never be removed from a community, only merged with other communities". I'd like to get feedback on whether I'm understanding the differences correctly. On reading the code, it seems that the main difference between "move_nodes" and "merge_nodes" is that "merge_nodes" does a single pass over the nodes: https://github.com/vtraag/leidenalg/blob/17eceefdb7eacc7c17dbf5cc16d44f385b6c15c0/src/Optimiser.cpp#L1276 and only operates on nodes that are currently NOT in a community with any other nodes: https://github.com/vtraag/leidenalg/blob/17eceefdb7eacc7c17dbf5cc16d44f385b6c15c0/src/Optimiser.cpp#L1281 By contrast, it appears that "move_nodes" does not have that if-statement and can thus operate on nodes that ARE in communities with other nodes. It also looks like "move_nodes" maintains a queue of nodes to iterate over: https://github.com/vtraag/leidenalg/blob/17eceefdb7eacc7c17dbf5cc16d44f385b6c15c0/src/Optimiser.cpp#L1044 and nodes can be re-added to the queue when their community assignment is detected to be potentially unstable: https://github.com/vtraag/leidenalg/blob/17eceefdb7eacc7c17dbf5cc16d44f385b6c15c0/src/Optimiser.cpp#L731-L736

Am I understanding these differences correctly? If so, I am curious why one would prefer to use merge_nodes_constrained rather than move_nodes_constrained, given that move_nodes_constrained seems like it would consider more possible moves and might thus lead to a better value for the objective function. I understand that Leiden relies on "merge_nodes_constrained" for its refinement step based on the documentation at https://leidenalg.readthedocs.io/en/stable/advanced.html#optimiser. Is this simply for the speed? Or perhaps because "move_nodes_constrained" can produce poorly-connected communities in the same way the original Louvain algorithm can, as discussed in the Leiden paper?

If someone wants to further subcluster ("refine") an initial community assignment, would you recommend running BOTH move_nodes_constrained AND merge_nodes_constrained, and using the partition that has the best modularity? And are there any other considerations that you'd recommend keeping in mind?

Thanks, Avanti

vtraag commented 3 years ago

I am curious why one would prefer to use merge_nodes_constrained rather than move_nodes_constrained

There are two reasons why we use merge_nodes_constrained instead of move_nodes_constrained, as you already suggested: (1) the speed; (2) the connectivity guarantee. For the refinement, we typically are not so much worried about getting good clusters, because they do not affect the partition directly, it only affects how we aggregate the graph. This essentially determines the move sets that we consider in the algorithm. Hence, a quick and reasonable way of exploring different possible move sets is most effective. In addition, the guarantee that the refinement procedure always returns connected clusters provides the overall guarantee of connectivity of communities in a single iteration of the Leiden algorithm.

If someone wants to further subcluster ("refine") an initial community assignment, would you recommend running BOTH move_nodes_constrained AND merge_nodes_constrained, and using the partition that has the best modularity? And are there any other considerations that you'd recommend keeping in mind?

It depends a bit on what your goal is.

If your goal is to simply improve the initial community assignment, I would simply suggest to run a few iterations of optimise_partition. If refining the partition would increase the quality of a partition (modularity if you are using that, but the same thing also applies to other quality functions), that is something that will be done while running optimise_partition. However, the improvement may consist not only of a refinement of the original partition, it can also consists of all others moves of nodes from one community to another.

If you are only interested in a strict refinement, then presumably move_nodes_constrained might be the best choice. However, it is not said that the refinement itself will be a partition of higher quality compared to the initial community assignment that you would be interested in.

Hopefully this is sufficient to go on? If you have any additional question, let me know!