vtraag / leidenalg

Implementation of the Leiden algorithm for various quality functions to be used with igraph in Python.
GNU General Public License v3.0
596 stars 78 forks source link

[Feature Request] NetworkX compatibility #65

Closed jolespin closed 3 years ago

jolespin commented 3 years ago

Would it be possible to add networkx support?

vtraag commented 3 years ago

You can easily convert any networkx graph to an igraph graph, see https://igraph.org/python/doc/api/igraph.Graph.html#from_networkx. In any case, the package uses the C library of igraph, networkx is Python only, so igraph would still be a dependency. So, in short, I don't intend to offer networkx support explicitly. Is there something that you are missing with the from_networkx conversion function?

jolespin commented 3 years ago

Ok cool that works too. I can just make a wrapper that has this in the beginning:

if isinstance(graph, (nx.Graph, nx.OrderedDiGraph, nx.DiGraph, nx.OrderedGraph)):
    graph = ig.Graph.from_networkx(graph)

Just had a few followup questions:

import leidenalg
import igraph as ig

g = ig.Graph.from_networkx(graph)
nodes_list = np.asarray(g.vs["_nx_name"])
# array(['MAG_883.12', 'MAG_883.1', 'MAG_883.14', 'MAG_883.20', 'MAG_883.23', 'MAG_883.3', 'MAG_883.5', 'MAG_883.7', 'MAG_883.8', 'MAG_883.11', 'MAG_883.13', 'MAG_883.6', 'MAG_883.15', 'MAG_883.17', 'MAG_883.18', 'MAG_883.21', 'MAG_883.22', 'MAG_883.24', 'MAG_883.9'], dtype='<U10')

# Unprocessed results
# list(leidenalg.find_partition(g, partition_type=leidenalg.ModularityVertexPartition, seed=0))
# # [[7, 9, 12, 13, 14, 15, 16, 17, 18], [5, 6, 8, 10, 11], [0, 1, 2, 3, 4]]

partitions = dict()
for partition, nodes in enumerate(leidenalg.find_partition(g, partition_type=leidenalg.ModularityVertexPartition, seed=0)):
    mapping = dict(zip(nodes_list[nodes], [partition]*len(nodes)))
    partitions.update(mapping)

partitions
# {'MAG_883.7': 0,
#  'MAG_883.11': 0,
#  'MAG_883.15': 0,
#  'MAG_883.17': 0,
#  'MAG_883.18': 0,
#  'MAG_883.21': 0,
#  'MAG_883.22': 0,
#  'MAG_883.24': 0,
#  'MAG_883.9': 0,
#  'MAG_883.3': 1,
#  'MAG_883.5': 1,
#  'MAG_883.8': 1,
#  'MAG_883.13': 1,
#  'MAG_883.6': 1,
#  'MAG_883.12': 2,
#  'MAG_883.1': 2,
#  'MAG_883.14': 2,
#  'MAG_883.20': 2,
#  'MAG_883.23': 2}
nodes_list
  1. Each sublist inside the larger list is the partition and each integer value is the index in the igraph graph (shown above as g.vs["_nx_name"])?
  2. My graph has a "weight" attribute where I have w ≥ 0 weights. Is this being used both by: 2a. leidenalg.ModularityVertexPartition function? 2b. find_partition function?
vtraag commented 3 years ago
  1. Each sublist inside the larger list is the partition and each integer value is the index in the igraph graph (shown above as g.vs["_nx_name"])?

Yes, when you simply iterate over the partition, each element is a list of nodes that belong to a community. That is, when you do

partition = leidenalg.find_partition(g, partition_type=leidenalg.ModularityVertexPartition, seed=0)

then partition[c] gives you the list of nodes in community c.

However, for your purpose, it would probably be easier to simply take the membership of the returned partition. That is, partition.membership gives a list with for each node the numeric community identifier. The order corresponds to the node order of the graph, so you can set an additional vertex attribute as g.vs['community'] = partition.membership if you want, and convert back to networkx. See https://igraph.org/python/doc/api/igraph.clustering.VertexClustering.html for more details regarding the VertexClustering object of igraph.

2. My graph has a "weight" attribute where I have w ≥ 0 weights. Is this being used both by: 2a. leidenalg.ModularityVertexPartition function? 2b. find_partition function?

No, not by default, you have to specify the weight edge attribute. You can either specify the attribute name or the list of weights explicitly (i.e. G.es['weight']). See https://leidenalg.readthedocs.io/en/stable/reference.html#leidenalg.find_partition for more details. Note that the partition.modularity value is a bit deceiving, it comes from igraph but its value does not reflect the quality of the partition necessarily correctly. Please use partition.quality() to get the correct quality value.

jolespin commented 3 years ago

Awesome, looking forward to implementing this on 2 upcoming publications I'm working on; will make sure to cite your paper. Its much much faster than Louvain as well as I've ran them side by side.

Just to be clear, leidenalg.ModularityVertexPartition weight argument does not need to be explicit if given in the find_partition function, is this correct? When I tried specifying the weight here it would not let me actually provide leidenalg.ModularityVertexPartition with arguments.

Lastly, is leidenalg.ModularityVertexPartition a good "default"? Most of my networks are fully connected and weighted.

I admit, I haven't given the initial publication a proper read but looking for to doing so this week.

Thanks again for your help.

vtraag commented 3 years ago

Awesome, looking forward to implementing this on 2 upcoming publications I'm working on

Great, good to hear it's useful for you!

Just to be clear, leidenalg.ModularityVertexPartition weight argument does not need to be explicit if given in the find_partition function,

Yes, you can simply specify the weight only in the find_partition function. If you want to specify it in a ModularityVertexPartition you first have to define the partition, and then run the Optimiser on it, i.e.

partition = la.ModularityVertexPartition(g, weights='weight')
opt = la.Optimiser()
opt.optimise_partition(partition)

Lastly, is leidenalg.ModularityVertexPartition a good "default"? Most of my networks are fully connected and weighted.

This answer probably depends a bit on whom you are asking, but I would say no. I would think that CPMVertexPartition is actually more meaningful, especially if your network is fully connected and weighted. You do have to specify a resolution parameter, but it has a quite intuitive meaning: the average weight between nodes within a cluster is higher than this resolution parameter, while the average weight between nodes in two different clusters is lower than this resolution parameter.

I admit, I haven't given the initial publication a proper read but looking for to doing so this week.

Enjoy reading! :wink: It also includes some more details with regards to the CPMVertexPartition and the interpretation that I provide above. Some additional properties are also provided in https://doi.org/10.1103/PhysRevE.84.016114.

jolespin commented 3 years ago

For CPMVertexPartition, is the resolution_parameter expecting a specific range or is this (0,infinity) and dependent on the range of edge weights?

vtraag commented 3 years ago

The resolution parameter in principle can take on any value. However, the resolution limit is in principle bounded below by the minimum weight of an edge (including a weight of 0 if the edge is missing), below which it is guaranteed to provide a solution of a single cluster. It is bounded above by the maximum weight of an edge, above which it will always return a singleton partition. For unweighted graphs, that means that the relevant range is between 0 and 1. If there are edges with negative weights for example, the resolution parameter can in principle be below 0.