taynaud / python-louvain

Louvain Community Detection
BSD 3-Clause "New" or "Revised" License
964 stars 200 forks source link

miscalculation of modularity #12

Closed mainulquraishi closed 7 years ago

mainulquraishi commented 7 years ago

https://stackoverflow.com/questions/44789566/bug-in-community-package

taynaud commented 7 years ago

You are right that second part is k_i * k_j / (4 sqr(m)), with k_i the degree of the community i. But you are only considering case where i==j, and thus it become:

k_i * k_i /4 sqr(m) == (k_i / (2 m)) ** 2

With your equation, I think you were only considering k_i and forget the k_j.

mainulquraishi commented 7 years ago

Thank you very much for the clarification. I actually misunderstood your code. However, i still have a question which is really bugging me. As i mentioned in my original question in stackoverflow, i am getting high modularity score for the best partition(manually detected), in my toy graph, if I only consider k_i . Obviously, you did not created the equation, but as you have better understanding, may be you can shed some light. In case you wan to test, here is is my code- `import community import networkx as nx import pandas as pd import matplotlib.pyplot as plt

G = nx.Graph() nodeDf=pd.read_csv("https://raw.githubusercontent.com/mainulquraishi/test/master/testGraph_1.csv") numberOfEdge=int(list(nodeDf.shape)[0]) edgeNumber=0 while edgeNumber<=numberOfEdge-1: edge=nodeDf.iloc[edgeNumber,].tolist() G.add_nodes_from(edge[0:2]) G.add_edge(edge[0],edge[1],capacity=1.0) edgeNumber=edgeNumber+1 part = community.best_partition(G) mod = community.modularity(part,G) print("modularity: "+str(mod)) values = [part.get(node) for node in G.nodes()] nx.draw_spring(G, cmap=plt.get_cmap('jet'), node_color = values, node_size=30, with_labels=True)

If you run this code, you can easily see that, partition which is originally partitioned by algorithm, is wrong. It should have been 1,2,3,4,5,6,7,8 in one cluster and 9,10,11,12,13,14 in another cluster. The following value of part giving the highest modularity if i don't use k_j. part={'one': 0, 'six': 0, 'eight': 0, 'seven': 0, 'five': 0, 'eleven': 1, 'ten': 1, 'twelve': 1, 'nine': 1, 'two': 0, 'fourteen': 1, 'thirteen': 1, 'four': 0, 'three': 0}`

Another thing is, I am thinking, the resolution argument of best_partition() function might be usefull for me. can you give me some idea in layman's term what is the difference between resolution 1.0 and 2.0. and when should I use this? sorry for this long boring massage.

taynaud commented 7 years ago

For the first part, I think you are facing issues described in this paper: https://arxiv.org/abs/physics/0607100

Basically, the best partition in term of modularity can be an aggregation or a division from a big cluster. For instance if you have several connected cliques, the partition maximizing the modularity may merge some cliques even if it has no true meaning.

For the second part, it is complicated. Reference is "Laplacian Dynamics and Multiscale Modular Structure in Networks", Lambiotte, J.-C. Delvenne, M. Barahona

A way to explain it is through random walks. A community can be seen as a trap for a random walker on the graph if it moves for a given distance. If you increase this distance, the walker may go further, and thus traps/communities are bigger. If you decrease it, the walker may go closer, and communities are smaller. I do not know the relationship between this "distance" and the resolution parameter. You should read their articles and the following ones if you need more details.

Best

taynaud commented 7 years ago

Can I close this issue ?

mainulquraishi commented 7 years ago

yes, you can. Thank you very much for your co-operation.