vtraag / leidenalg

Implementation of the Leiden algorithm for various quality functions to be used with igraph in Python.
GNU General Public License v3.0
575 stars 77 forks source link

Introduce GeneralizedModularity #109

Closed arnaudon closed 1 year ago

arnaudon commented 1 year ago

We are attempting to introduce leiden solver into Markov Stability with generalized modularity in this other repository here: https://github.com/barahona-research-group/PyGenStability/pull/43

The aim of this PR is to have the option to either use the original generalized Louvain from https://github.com/michaelschaub/generalizedLouvain or the corresponding generalized Leiden.

I'm happy to discuss more if needed here, or by other means. I work with Michael Schaub, Mauricio Barahona and other collaborators to improve our PyGenStability package with more functionalities.

arnaudon commented 1 year ago

@vtraag , I'm not sure you receive notifications, so I ping you here, just in case ;)

vtraag commented 1 year ago

Hi @arnaudon! Thanks for this, great to see such a contribution coming in! I've been away on holidays :tent: and travelling for work :airplane: so I hadn't gotten around to my backlog of GitHub notifications yet.

I was planning on making a new release today to support the new igraph 0.10.x release series, hopefully we can get your contribution in as well. But if a bit of delay is necessary in order to do so, that's no problem of course.

I'll first let the CI run and see what the results are and then review the code later today.

vtraag commented 1 year ago

Could you perhaps point me to a paper or some other description of what the exact quality function is that you are implementing? That might help me to understand some of the code better.

arnaudon commented 1 year ago

Welcome back @vtraag ! Don't look too closely at this PR yet, I'm about to correct it quite substantially, I realised I didn't to it properly. There is one last issue I need to resolve before I send you a new commit. I'll give you more info as well, np!

vtraag commented 1 year ago

OK, sure, no problem! I'll make a new release then probably before, but we can easily make a new release shortly afterwards, once this is integrated.

arnaudon commented 1 year ago

Last push now works as expected, I will optimise the code a bit to move out some computations, and write the doc so we can understand what this is doing, almost there!

arnaudon commented 1 year ago

Hello @vtraag , I have added more doc in the python interface, with the equation of generalised modularity. It is 'simply' relaxing the null model to arbitrary vectors. The difficult part I encountered was to collapse the null models properly in the step where the graph is collapsed. I made an attempt in the code, but it may break other quality constructors, as I had to pass the partition used to collapse the graph to the create method of the partition, so I can collapse the null model properly. I'm not sure this is the best way to do it, I'm open to try any better suggestions!

arnaudon commented 1 year ago

Also, I didn't think too much about possible performance improvements, I'm not sure what would be most beneficial to try, if you have any ideas as well, I can try!

arnaudon commented 1 year ago

now the tests should pass on previous constructors, but not yet on the new one, I'll try to fix it asap

vtraag commented 1 year ago

Hi @arnaudon,

Thanks for the last additions!

I'm still trying to understand properly what you are trying to optimise. From the quality function it looks like you want to optimise something like the following:

$$ Q = \sum{ij} A{ij} - \sum{m} d{i,m} d_{j,m} $$

This does not require a separate new VertexPartition type I believe, as this can already be implemented as layers with CPM, see the documentation for some explanation. That is, with $m$ layers, the quality function with multiple layers $1, \ldots, m$ is $Q = \sum_m w_m Qm $. If we use CPM with node sizes $d{im}$ for $i = 1, \ldots, n$ for each layer $m$, and use an identical graph $G$ for each layer, we arrive at

$$ Q = \sum_m wm \left(\sum{ij} A_{ij} - \gammam d{i,m} d_{j,m} \right) $$

which, assuming $\sum w_m = 1$ simplifies to

$$ Q = \sum{ij} A{ij} - \sum_m \phim d{i,m} d_{j,m}$$

with $\phi_m = \gamma_m w_m$, which corresponds to some other resolution parameter for CPM for layer $m$. Obviously, setting $\phi_m = 1$ corresponds to the quality function mentioned first.

Could you perhaps explain how generalised modularity would differ from this and why a separate implementation would be required?

(For some reason, it seem that you are doing $\summ d{i,m} d_{j,m+1}$ in the actual quality function, but I don't really understand properly why. Perhaps you can also explain this?)

arnaudon commented 1 year ago

This is a very good suggestion indeed, I didn't think we could do it like that. I'll give it a try asap to see if it works. For the last question, I', not entirely sure that this is really needed, it is to be even more general and allow to have different d's in the same layer (so that the null model matrix is not symmetrical), but we could get rid of it, it is not used afaik.

arnaudon commented 1 year ago

I just tried, and I got

BaseException: Could not construct partition: Expected integer value for node sizes vector.

when trying to pass a float value to node_sizes, which is needed in our case, so that the null vectors remain arbitrary. Would you think it would be better to allow for non-int node_sizes instead of this implementation?

vtraag commented 1 year ago

Ah yes, this can be easily generalised. Could you open an issue for that, so that I won't forget? Happy to include this in a next release.

vtraag commented 1 year ago

(Or if you want, a PR changing this is of course also welcome)

arnaudon commented 1 year ago

Ok, I'll try a draft PR on that to see if it works later today or tomorrow.

arnaudon commented 1 year ago

I just tried, but my c++ knowledge is to limited to be able to do that in a reasonable time, there is an issue with Graph() overloaded definition, as there will be two vectors of double types to pass, and some other issues more internals to changing this type. Would you have an estimate on how you could implement this change? Thank you very much!

vtraag commented 1 year ago

Sure, I can probably take a look at it somewhere next week.

arnaudon commented 1 year ago

Great, thank you very much!

arnaudon commented 1 year ago

Solved by #115 and comment above