vtraag / leidenalg

Implementation of the Leiden algorithm for various quality functions to be used with igraph in Python.
GNU General Public License v3.0
575 stars 77 forks source link

Internally disconnected communities with ModularityVertexPartition #98

Closed KubaWernerowski closed 2 years ago

KubaWernerowski commented 2 years ago

Hi @vtraag I think this is a possible bug, since Leiden is supposed to guarantee that communities are not internally disconnected.

python version: 3.8.2 leidenalg version: 0.8.8 igraph version: 0.9.9 networkx version: 2.5.1

Code to reproduce:

G = nx.barbell_graph(5, 1)

weights = {}

for u, v in G.edges:
    if u == 4 and v == 5:
        weights[(u,v)] = 1000
    else:
        weights[(u,v)] = 10

nx.set_edge_attributes(G, weights, 'weight')

G_ig = ig.Graph.from_networkx(G)

partition = la.find_partition(G_ig, la.ModularityVertexPartition)
partition2 = la.find_partition(G_ig, la.ModularityVertexPartition, weights='weight')

result = {}
result2 = {}

for comm_id, comm_members in enumerate(partition):
    for igraph_id in comm_members:
        result[igraph_id] = comm_id

for comm_id, comm_members in enumerate(partition2):
    for igraph_id in comm_members:
        result2[igraph_id] = comm_id

print('no weights\n', result)
print('weights\n', result2)
no weights
 {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 10: 0, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1}

weights
 {5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 0: 1, 1: 1, 2: 1, 3: 1, 4: 2, 10: 2}
Screen Shot 2022-03-14 at 11 55 41 AM

Given this graph, it seems that when using ModularityVertexPartition, nodes 4 and 10 are in the same community, but their community is internally disconnected.

vtraag commented 2 years ago

Hi @KubaWernerowski , thanks for your question. Please note that igraph uses integer representations for graphs, while networkx just uses any object. In this case, the networkx graph G happens to use integers as well, but this is not necessary. During the conversion in ig.Graph.from_networkx(G) every node from G is assigned an integer index, which in this case, does not coincide with the integer label from G itself. Unfortunately, it is not possible for igraph to guarantee that the integer index matches the integer label from networkx, because in general, it can simply be any Python object in networkx. The original labels from networkx are stored as a vertex attribute called _nx_name in igraph:

>>> G_ig.vs['_nx_name']
[0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 5]

Note that in this case the nodes are indeed not ordered according to the original networkx labels.

You can see the original labels of the first community as

>>> G_ig.vs[partition[0]]['_nx_name']
[0, 1, 2, 3, 4, 5]

Simply plotting both partition and partition2 shows that indeed the clusters are as expected (and connected):

ig.plot(partition) image

ig.plot(partition2) image

I assume this solves your problem, so I'll close the issue for now. If you do have another question, feel free to re-open this issue (or open a new issue if it's about something else).

KubaWernerowski commented 2 years ago

Hi @vtraag, thank you for the quick response, and the detailed explanation of what was going on; I really appreciate it, thank you!