igraph / rigraph

igraph R package
https://r.igraph.org
543 stars 200 forks source link

how can I access the HIERARCHICAL structure identified by Louvain clustering? #291

Open antonkratz opened 6 years ago

antonkratz commented 6 years ago

I perform Louvain clustering on a graph like this:

library(igraph)
df <- read.table("INT.TS.YeastNet.v3.1101gene.3510link.txt")
myg <- graph.data.frame(df, directed=F)
myl <- cluster_louvain(myg)

How can I extract the hierarchical community structure identified by Louvain? I am aware of the functions communities and membership which I can apply on a communities object myl, however these return flat, disjoint partionings, no hierarchies. Louvain identifies a hierarchical tree structure, not just a flat partioning, but how can I access it?

-- igraph 1.2.1

The dataset can be found at: https://www.inetbio.org/yeastnet/download.php?type=3&name=INT.TS.YeastNet.v3.1101gene.3510link.txt

Vivianstats commented 5 years ago

Hi antonkratz,

I wonder if you have found a solution?

iosonofabio commented 5 years ago

Hi guys, Louvain treats each temporary community as a node in its recursive scheme, then refines to give a final partitioning.

To obtain the reduced graph in the final partitioning you can just construct a graph with nodes equal to each community and with undirected, weighted edges with weight equal to the sum of the weights from all members of that community to all members of the other community. That generates massive self-loops of course, that's how the communities come to be in the first place.

If you want access to the whole stack of temporary partitionings that are generated by Louvain on its way towards a decent result, I'm afraid you would need to modify the function by adding a side effect that writes the current progress to file before or after each merging step. Not sure how useful that would be though, except for educational purposes.

Btw, Louvain is fairly sensitive to creating almost unconnected communities (local maxima of the objective function) and is not even very fast compared to later variations on the theme, latest of which is called "Leiden algorithm".

antonkratz commented 5 years ago

I wonder if you have found a solution?

@Vivianstats

Try this:

> karate <- make_graph("Zachary")
> cl <- cluster_louvain(karate)
> cl$memberships
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
[1,]    4    4    2    2    1    3    3    2    6     2     1     4     2     2
[2,]    2    2    2    2    1    1    1    2    4     2     1     2     2     2
     [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26]
[1,]     6     6     3     4     6     4     6     4     6     5     5     5
[2,]     4     4     1     2     4     2     4     2     4     3     3     3
     [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34]
[1,]     6     5     5     6     6     5     6     6
[2,]     4     3     3     4     4     3     4     4

The hierarchical clusterings go from low (coarse grained) to high (fine grained), highest index has the vector with the most fine grained clustering.

The hierarchical clusterings are valid clusterings in their own right. Naturally occurring networks often have an intrinsic hierarchical organization, hierarchical clustering is an attempt to uncover this structure, giving you the option to study the hierarchical organization itself, across levels of the hierarchy, or just a partition at a desired level of detail. This is clearly referred to in the original paper [1] and countless published research is building on this. It absolutely makes sense to want to have the hierarchical clustering and it is strange that there is no public, documented function to extract it.

[1] Blondel, Vincent D; Guillaume, Jean-Loup; Lambiotte, Renaud; Lefebvre, Etienne (9 October 2008). "Fast unfolding of communities in large networks". Journal of Statistical Mechanics: Theory and Experiment. 2008 (10): P10008. arXiv:0803.0476

vtraag commented 5 years ago

I do agree with @iosonofabio that only the highest level of the Louvain algorithm makes sense from the perspective of optimising modularity. The question of a hierarchical structure is separate from this. Nonetheless, some people may be interested in the separate levels. The separate levels are implemented in the C function igraph_community_multilevel, but apparently it is not exposed in R. I am not sure why.

vtraag commented 5 years ago

Oh, sorry, I did not read your @antonkratz reply good enough, apparently it is implemented in R. In principle it should be straightforward to reconstruct the actual tree from the memberships, but I am not familiar enough with R to explain how to best do this.

iosonofabio commented 5 years ago

Happy you figured it out. I agree that hierarchies appear sometimes in the data 😉

Vivianstats commented 5 years ago

Thanks a lot for the discussion!