zhiyzuo / python-modularity-maximization

Python implementation of Newman's spectral methods to maximize modularity.
MIT License
45 stars 25 forks source link

Regarding the definition of directed modularity #1

Closed Fengdan92 closed 6 years ago

Fengdan92 commented 6 years ago

Hi,

I was looking at the get_modularity function from modularitymaximization.utils. Directed modularity is defined as: `Q = \frac{1}{m}\sum{i,j} (A_ij - \frac{k_i^{in} kj^{out}}{m}) * \detal{c_i, c_j}` Strictly speaking A_ij should be A_ji, which is 1 when there is an edge from j to i, and 0 otherwise. This is because of the concept that edges j->i should make larger contributions to Q if k_i^{in} or k_j^{out} is small. See Leicht, E. A., & Newman, M. E. J. (2008). Community Structure in Directed Networks. Physical Review Letters, 100(11), 118703, the paragraph following equ.(3).

However, I also believe this is more of a conceptual point, since using either A_ij or A_ji gives the same Q thanks to the (i,j) symmetry in the definition. Please let me know what you think, as I'm still new to the field and am not 100% sure about my 'beliefs' :)

zhiyzuo commented 6 years ago

I read the description and it seems like weird to me that they still use A[i,j] when they really mean A[j,i]. If this is the case, do you think networkx's function directed_modularity_matrix is also problematic (I used this one to generate the modularity matrix B) since their A[i,j] is for edges i -> j instead of the weird definition mentioned in the paper?

BTW, I am not a physicist so can you help improve any bugs/misunderstanding of the concepts by reviewing the code I have? If you would like to make this code better, I am happy to have you as a collaborator, especially on making this package correct first.

I copied the docstring as follows:

directed_modularity_matrix(G, nodelist=None, weight=None)
    Return the directed modularity matrix of G.

    The modularity matrix is the matrix B = A - <A>, where A is the adjacency
    matrix and <A> is the expected adjacency matrix, assuming that the graph
    is described by the configuration model.

    More specifically, the element B_ij of B is defined as
        B_ij = A_ij - k_i(out) k_j(in) / m
    where k_i(in) is the in degree of node i, and k_j(out) is the out degree
    of node j, with m the number of edges in the graph. When weight is set
    to a name of an attribute edge, Aij, k_i, k_j and m are computed using
    its value.

    Parameters
    ----------
    G : DiGraph
       A NetworkX DiGraph

    nodelist : list, optional
       The rows and columns are ordered according to the nodes in nodelist.
       If nodelist is None, then the ordering is produced by G.nodes().

    weight : string or None, optional (default=None)
       The edge attribute that holds the numerical value used for
       the edge weight.  If None then all edge weights are 1.

    Returns
    -------
    B : Numpy matrix
      The modularity matrix of G.

    Examples
    --------
    >>> import networkx as nx
    >>> G = nx.DiGraph()
    >>> G.add_edges_from(((1,2), (1,3), (3,1), (3,2), (3,5), (4,5), (4,6),
    ...                   (5,4), (5,6), (6,4)))
    >>> B = nx.directed_modularity_matrix(G)

    Notes
    -----
    NetworkX defines the element A_ij of the adjacency matrix as 1 if there
    is a link going from node i to node j. Leicht and Newman use the opposite
    definition. This explains the different expression for B_ij.

    See Also
    --------
    to_numpy_matrix
    adjacency_matrix
    laplacian_matrix
    modularity_matrix

    References
    ----------
    .. [1] E. A. Leicht, M. E. J. Newman,
       "Community structure in directed networks",
        Phys. Rev Lett., vol. 100, no. 11, p. 118703, 2008.
Fengdan92 commented 6 years ago

So I looked at the code and also tested the code with a simple example. I believe your code is working correctly!

First of all, the definition in your get_modularity function Q = \frac{1}{m}\sum_{i,j} \(A_ij - \frac{k_i^{in} k_j^{out}}{m}\) * \delta_{c_i, c_j} is correct, as it is completely equivalent to the definition proposed by Leicht, E. A., and Newman, M. E. J (2008): Q = \frac{1}{m}\sum_{i,j} \(A_ji - \frac{k_i^{in} k_j^{out}}{m}\) * \delta_{c_i, c_j}. This is because of the (i,j) symmetry. For example, assuming delta_{c_1, c_2} = 1, we have (A_12 - \frac{k_1^{in} k_2^{out}) + (A_21 - \frac{k_2^{in} k_1^{out}) = (A_12 - \frac{k_1^{out} k_2^{in}) + (A_21 - \frac{k_2^{out} k_1^{in}). If you look at newer papers that cited Leicht, E. A., and Newman, M. E. J (2008), you can see that some people take A_ij (your code), while others take A_ji (Leicht and Newman) in the definition of directed modularity. For example, Kim et al. (2010) takes Leicht and Newman's form, whereas Dugue and Perez (2015) take your code's form.

Second, I tested your code with a simple example: G = nx.DiGraph([(1,2),(2,1),(2,4),(4,3),(3,4),(3,2)]) The best partition should give 2 modules, one has node 1 and 2, the other has node 3 and 4. And the modularity of this partition, as calculated by Newman's definition, should be 1/6. Your code gave the expected partition and modularity value (something like 0.1666666667). So it should be working correctly!

Thus, directed_modularity_matrix should also be working correctly, as either B_ij = A_ij - k_i(out) k_j(in) / m or B_ij = A_ij - k_i(in) k_j(out) / m shouldn't matter.

zhiyzuo commented 6 years ago

Great. I'll close this then. Thanks for the effort!