Open Splines opened 1 year ago
Hello Dominic,
Thank you so much for this issue. The way you presented it is impressive and we are deeply grateful for that!
We indeed think that you found a bug here. Hopefully this will not have had too much impact on the communities computed with our algorithm so far (even if there might be some side effects as you mention, this may not have affected the greedy choice made by the algorithm too significantly...)
We would hence really appreciate your pull request with this change. Please do not hesitate to open other issues if you find some other strange behaviors here, we probably made some mistakes at some other places!
And once again thank you for your contribution. Anthony
Hey Anthony, thanks for your answer. I will open a PR soon, but am a bit busy right now, so I might take one more week ;)
Hello Nicolas Dugué and Anthony Perez,
first of all thanks for making this great project open source, it really helps to understand the inner workings of the Louvain algorithm.
I just tried to put in one undirected graph into the C++ Louvain implementation by Blondel et al. (see the "newversion" tab on the left side of that website) and then the same graph into your directed Louvain algorithm. However, the initial modularity value $Q$ differs between the original Louvain and your directed Louvain implementation. While this may not be a big problem as the greedy optimization only needs to compare relative modularity values, it still might have some unwanted side effects.
In order explain what I mean exactly, let's consider the following graph I randomly came up with where the blue numbers denote the weights for the undirected edges.
Reproducing the issue
With the original Louvain code
We can represent the graph using this `graph.txt` file for the original Louvain algorithm: ``` 0 0 3.0 0 1 1.0 1 2 5.0 2 3 2.5 3 1 7.0 3 3 1.0 ``` Download the [source code](https://sites.google.com/site/findcommunities/newversion) and put the `graph.txt` file in the root of the project. Open the file `community.cpp` and inside the `double Community::modularity()` method, add a printout before the `return q` statement like so: ```c++ std::cout << q << std::endl; ``` Then run: ``` make ./convert -i ./graph.txt -o ./graph.bin -w graph.weights ./community ./graph.bin -w graph.weights -v ``` You should observe this modularity value in the console: **`-0.172653`**.With the DirectedLouvain code
Next, we will do the same thing with your directed Louvain algorithm. ``` git clone https://github.com/nicolasdugue/DirectedLouvain.git cd ./DirectedLouvain ``` I manually converted the undirected graph to a directed graph so that DirectedLouvain can handle it like so: ``` 0 0 3.0 0 1 1.0 1 0 1.0 1 2 5.0 2 1 5.0 2 3 2.5 3 2 2.5 3 1 7.0 1 3 7.0 3 3 1.0 ``` Put this into the file `graph/graph.txt`. Then go to the file `community.cpp` and inside the `double Community::modularity()` method, add a printout before the `return q` statement like so: ```c++ cerr << "Modularity is: " << q << std::endl; ``` Then run: ``` make ./bin/community -f graph/graph.txt -w -v ``` You should see (among others) this line on the console ``` Modularity is: -0.154286 ```We get a different modularity value (
-0.154286
vs.-0.172653
) when every node is in its own singleton community at the very beginning.Verifying the correct solution by hand
Let's see which of these solutions is correct by carrying out the math by hand.
See details
We have our adjacency matrix for this graph given by: ```math A = \begin{pmatrix} 3 & 1 & 0 & 0\\ 1 & 0 & 5 & 7\\ 0 & 5 & 0 & 2.5\\ 0 & 7 & 2.5 & 1 \end{pmatrix} ``` with the initial partition being: ```math \mathcal{C} = \{ c_1, c_2, c_3, c_4 \} = \Bigl\{ \{0\}, \{1\}, \{2\}, \{3\}, \{4\} \Bigr\} ``` For our total weighted number of edges, we get: ```math m = \frac{1}{2} \sum_{u,v} A_{uv} = \frac{1}{2} (3 + 2 + 10 + 14 + 5 + 1) = 17.5 ``` Now, we can calculate modularity using the definition for the vertex degree from [here](https://arxiv.org/pdf/cond-mat/0407503.pdf) (*equation 5*): ```math k_u = \sum_{v} A_{uv} ``` ```math \begin{align} Q(\mathcal{C}) &\coloneqq \frac{1}{2m} \sum_{c\in C} \sum_{u\in c} \sum_{v\in c} \biggl( \underbrace{A_{uv} - \frac{k_u k_v}{2m}}_{\eqqcolon \phi} \biggr)\\ &= \frac{1}{2m} \biggl[ \, \sum_{u,v \in c_1} \phi + \sum_{u,v \in c_2} \phi + \sum_{u,v \in c_3} \phi + \sum_{u,v \in c_4} \phi \biggr]\\ % &= \frac{1}{2m} \biggl[ \Bigl( A_{00} - \frac{k_0 k_0}{2m} \Bigr) + \Bigl( A_{11} - \frac{k_1 k_1}{2m} \Bigr) + \Bigl( A_{22} - \frac{k_2 k_2}{2m} \Bigr) + \Bigl( A_{33} - \frac{k_3 k_3}{2m} \Bigr) \biggr]\\ % &= \frac{1}{35} \cdot \biggl[ \Bigl( 3 - \frac{4^2}{35} \Bigr) + \Bigl( 0 - \frac{13^2}{35} \Bigr) + \Bigl( 0 - \frac{7.5^2}{35} \Bigr) + \Bigl( 1 - \frac{10.5^2}{35} \Bigr) \biggr]\\ % &= - \frac{423}{2450} \approx -0.17265 \end{align} ```This shows that indeed
-0.17265
is the correct modularity value in the beginning.So what's the "problem" with
DirectedLouvain
?In your paper Direction matters..., you give this definition for the directed modularity:
with
This definition can be broadened for weighted graphs by replacing the size of the set with the sum of the weights of the elements, here their respective edge weights.
In this definition, you did not exclude the node itself, e.g. for node $u$ when we calculate $d_u^{in}$ we also consider $v = u\in V$ in the set. This means, self-loops should be considered in both $d_u^{in}$ as well as in $d_u^{out}$. However, in the code it seems that it is only considered for outgoing degrees.
For this, see the method
double Community::modularity()
. Inside the for loop, add adisplay()
call and changecout
tocerr
indisplay()
:Show display-method
```c++ void Community::display() { for (unsigned int i = 0; i < size; ++i) { cerr << " " << (this->g)->correspondance[i] << "/" << node_to_community[i] << "/" << this->communities_arcs[i].total_arcs_inside << "/" << this->communities_arcs[i].total_outcoming_arcs << "/" << this->communities_arcs[i].total_incoming_arcs; cerr << endl; } } ```You will see this output in the first call to
Community::modularity()
:So, then in
modularity()
, we calculate (with the simplified formula of modularity):But as we can see in the output,
total_incoming_arcs
is just1
and not the expected3+1=4
like thetotal_outcoming_arcs
. This is where the different modularity values come from.Fix
We can easily fix this by explicitly considering the self-loops for $d_u^{in}$:
Note these two lines
Now we also initially get:
If this is really a "bug", I am happy to create a pull request with this change. If this is intended behavior, I'm curious to know the reasoning behind it. I'd also be happy if you corrected some errors in thought I may have made during the writing of this.
Thanks in advance, Dominic