vtraag / leidenalg

Implementation of the Leiden algorithm for various quality functions to be used with igraph in Python.
GNU General Public License v3.0
596 stars 78 forks source link

Minimum community size #53

Open brilyantconsulting opened 3 years ago

brilyantconsulting commented 3 years ago

The maximum cluster size feature is a great feature! Would it be possible to add something for a minimum cluster size constraint? In particular, high resolution clustering often results in singletons, which are not particularly useful for downstream analysis.

vtraag commented 3 years ago

There might be some possibilities in this direction. I will try to see what I can do, but it might take some time.

brilyantconsulting commented 3 years ago

Thanks for looking into this! I can't imagine it is a trivial feature to add.

vtraag commented 3 years ago

Well, we already do have some solutions in another context. The question is how to best adapt it to the more general framework of this package.

davidfrosch commented 3 years ago

Hello vincent, i just wanted to add, that I was as well looking for that feature! It would be great if it could be added at some point ! Thanks for the great work.

deklanw commented 3 years ago

This would also be greatly helpful for me.

As a dirty workaround I was considering just merging singletons greedily into whatever group but that doesn't resolve the duotons and tripletons, etc which can be equally annoying.

Are there any other workarounds possible at this time?

Well, we already do have some solutions in another context.

@vtraag Can you expand on this bit? Is there another algorithm which does have minimum size restrictions?

vtraag commented 2 years ago

@vtraag Can you expand on this bit? Is there another algorithm which does have minimum size restrictions?

In the Java version of the Leiden algorithm, we are using an approach that progressively merges smaller communities. The approach used there may not generalise directly to other quality functions, which are also supported by this Python package. If you want to use the Java version, you can simply supply this argument on the command line when you run it (see the documentation for more information).

For the details of how to do this see the source code. The idea is that you progressively assign small communities to other communities, based on the relative density. The relative density here refers to w_ij / n_i n_j where w_ij is the total number of edges (or weight) between community i and j and where n_i is the total number of nodes in community i. You can then assign any community i that is too small to any other community to which it has the highest relative density. There is actually some argument for doing so in the context of CPM. If you do this starting with the smallest community, you can iteratively assign the smallest cluster that is too small to larger clusters.

girwin1 commented 3 months ago

Did this progress?