Closed johandahlberg closed 8 months ago
Hi @johandahlberg - unfortunately, this is a known issue with Leiden as well as other modularity maximization methods. These methods aren't doing model selection, and aren't good at "knowing" when the community structure they find is significant or would be expected under a random model as you point out. See for instance in this image of an adjacency matrix after sorting by the partition inferred by Leiden (from https://bdpedigo.github.io/networks-course/community_detection.html#overfitting):
You'll notice that those "communities" actually do have more within-community connections than without, so these structures can actually pop up just by chance.
One option is to explicitly form a null distribution (say, sampling a few hundred ER graphs, running Leiden, and computing the modularity) and then comparing your results to such a distribution.
Zhang and Peixoto also wrote this nice article about this problem https://journals.aps.org/prresearch/abstract/10.1103/PhysRevResearch.2.043271 and provide an alternative formulation that relies on a variant of the stochastic block model. Their implementation is in graph-tool here. The tradeoff is I'm guessing it's slower, might be harder to optimize, etc. but could be helpful for your application so I thought I'd mention it.
Thank you for your quick and useful answer @bdpedigo! I'll look further into the methods you suggest.
Absolutely - if it's alright with you I'm going to close this issue since I don't think there's anything to be done in terms of graspologic, but please feel free to post any further questions here.
Hi!
I'm trying to run Leiden community detection on a randomly generated graph (as part of a test suite for a larger project). I would expect that running community detection on this graph would give me a single community back, but that does not seem to be the case. My expectation in this case is based on previously having done the same thing with igraph and getting a single community back. I can get it to return a single community by adjusting the
resolution
parameter, but I would prefer not to change that since I need this to work automatically later for the use-case I'm working on.My question is if this is a reasonable expectation? And are there any parameters other than
resolution
that I should look into in order to get a single community back for this case?I have attached the code I've written to explore this below to make it clearer what I'm thinking of.
The result of this is:
To try to make sense of this I also plotted the graph and the resulting communities, and it's difficult for me to understand why the communities were assigned the way they are in this graph.