Closed Fengdan92 closed 6 years ago
I read the description and it seems like weird to me that they still use A[i,j]
when they really mean A[j,i]
. If this is the case, do you think networkx
's function directed_modularity_matrix
is also problematic (I used this one to generate the modularity matrix B
) since their A[i,j]
is for edges i -> j
instead of the weird definition mentioned in the paper?
BTW, I am not a physicist so can you help improve any bugs/misunderstanding of the concepts by reviewing the code I have? If you would like to make this code better, I am happy to have you as a collaborator, especially on making this package correct first.
I copied the docstring as follows:
directed_modularity_matrix(G, nodelist=None, weight=None)
Return the directed modularity matrix of G.
The modularity matrix is the matrix B = A - <A>, where A is the adjacency
matrix and <A> is the expected adjacency matrix, assuming that the graph
is described by the configuration model.
More specifically, the element B_ij of B is defined as
B_ij = A_ij - k_i(out) k_j(in) / m
where k_i(in) is the in degree of node i, and k_j(out) is the out degree
of node j, with m the number of edges in the graph. When weight is set
to a name of an attribute edge, Aij, k_i, k_j and m are computed using
its value.
Parameters
----------
G : DiGraph
A NetworkX DiGraph
nodelist : list, optional
The rows and columns are ordered according to the nodes in nodelist.
If nodelist is None, then the ordering is produced by G.nodes().
weight : string or None, optional (default=None)
The edge attribute that holds the numerical value used for
the edge weight. If None then all edge weights are 1.
Returns
-------
B : Numpy matrix
The modularity matrix of G.
Examples
--------
>>> import networkx as nx
>>> G = nx.DiGraph()
>>> G.add_edges_from(((1,2), (1,3), (3,1), (3,2), (3,5), (4,5), (4,6),
... (5,4), (5,6), (6,4)))
>>> B = nx.directed_modularity_matrix(G)
Notes
-----
NetworkX defines the element A_ij of the adjacency matrix as 1 if there
is a link going from node i to node j. Leicht and Newman use the opposite
definition. This explains the different expression for B_ij.
See Also
--------
to_numpy_matrix
adjacency_matrix
laplacian_matrix
modularity_matrix
References
----------
.. [1] E. A. Leicht, M. E. J. Newman,
"Community structure in directed networks",
Phys. Rev Lett., vol. 100, no. 11, p. 118703, 2008.
So I looked at the code and also tested the code with a simple example. I believe your code is working correctly!
First of all, the definition in your get_modularity
function
Q = \frac{1}{m}\sum_{i,j} \(A_ij - \frac{k_i^{in} k_j^{out}}{m}\) * \delta_{c_i, c_j}
is correct, as it is completely equivalent to the definition proposed by Leicht, E. A., and Newman, M. E. J (2008):
Q = \frac{1}{m}\sum_{i,j} \(A_ji - \frac{k_i^{in} k_j^{out}}{m}\) * \delta_{c_i, c_j}
. This is because of the (i,j) symmetry. For example, assuming delta_{c_1, c_2} = 1, we have
(A_12 - \frac{k_1^{in} k_2^{out}) + (A_21 - \frac{k_2^{in} k_1^{out}) = (A_12 - \frac{k_1^{out} k_2^{in}) + (A_21 - \frac{k_2^{out} k_1^{in})
.
If you look at newer papers that cited Leicht, E. A., and Newman, M. E. J (2008), you can see that some people take A_ij
(your code), while others take A_ji
(Leicht and Newman) in the definition of directed modularity. For example, Kim et al. (2010) takes Leicht and Newman's form, whereas Dugue and Perez (2015) take your code's form.
Second, I tested your code with a simple example:
G = nx.DiGraph([(1,2),(2,1),(2,4),(4,3),(3,4),(3,2)])
The best partition should give 2 modules, one has node 1 and 2, the other has node 3 and 4. And the modularity of this partition, as calculated by Newman's definition, should be 1/6. Your code gave the expected partition and modularity value (something like 0.1666666667). So it should be working correctly!
Thus, directed_modularity_matrix
should also be working correctly, as either B_ij = A_ij - k_i(out) k_j(in) / m
or B_ij = A_ij - k_i(in) k_j(out) / m
shouldn't matter.
Great. I'll close this then. Thanks for the effort!
Hi,
I was looking at the get_modularity function from modularitymaximization.utils. Directed modularity is defined as: `Q = \frac{1}{m}\sum{i,j} (A_ij - \frac{k_i^{in} kj^{out}}{m}) * \detal{c_i, c_j}` Strictly speaking A_ij should be A_ji, which is 1 when there is an edge from j to i, and 0 otherwise. This is because of the concept that edges j->i should make larger contributions to Q if k_i^{in} or k_j^{out} is small. See Leicht, E. A., & Newman, M. E. J. (2008). Community Structure in Directed Networks. Physical Review Letters, 100(11), 118703, the paragraph following equ.(3).
However, I also believe this is more of a conceptual point, since using either A_ij or A_ji gives the same Q thanks to the (i,j) symmetry in the definition. Please let me know what you think, as I'm still new to the field and am not 100% sure about my 'beliefs' :)