vtraag / leidenalg

Implementation of the Leiden algorithm for various quality functions to be used with igraph in Python.
GNU General Public License v3.0
566 stars 76 forks source link

Assigning new nodes to existing communities #168

Closed girwin1 closed 4 months ago

girwin1 commented 5 months ago

Not an issue, but looking for a pointer.

I have a graph consisting of five layers for a set of nodes then use Leiden for the communities.

I then have a far larger set of nodes but only have the edges for one of the layers (it’s not possible to get the other 4). The missing edges from the other layers are much higher quality so give good communities.

I want to assign the new nodes to these communities based off the available data and reject those nodes that don’t sufficiently connect to the communities.

What is the best way to go about this?

Is it to create a new single layer graph constructed on the edges available for all data points, re cluster and then assign the old partition value nodes new to the cluster?

vtraag commented 5 months ago

Have you already considered using "Fixed Nodes"?

girwin1 commented 5 months ago

Thank you for responding. That's what I'm trying.

For a multiplex graph, I guess it doesn't matter which graph you choose to be G (in the "Fixed Nodes" example), as the partitions are the same across the layers?

As for new nodes, are completely unconnected nodes disregarded? And then you can control the sensitivity for assigning new nodes with the resolution parameter?

Or would you assign all new (connected) nodes to a partition then exclude those below a certain modularity as a separate step?

vtraag commented 4 months ago

For a multiplex graph, I guess it doesn't matter which graph you choose to be G (in the "Fixed Nodes" example), as the partitions are the same across the layers?

Yes, indeed. Note that the argument is_membership_fixed is an argument of optimise_partition_multiplex, not of any particular partition of any layer/graph.

As for new nodes, are completely unconnected nodes disregarded? And then you can control the sensitivity for assigning new nodes with the resolution parameter?

New nodes are not disregarded, they are simply not fixed. You should make sure that they are assigned some cluster when you create a new graph and initial partition of that new graph.

Or would you assign all new (connected) nodes to a partition then exclude those below a certain modularity as a separate step?

I am not entirely sure if I understand what you mean here. Personally, I would just assign each new node simply to a new community, such that each new node is a singleton (and unrelated to any existing community). Then by keeping the existing ones fixed, the algorithm will automatically find a way to assign the new nodes to the existing communities (or keep them separate if that is better).

girwin1 commented 4 months ago

Thank you. This is what I did in the end. I think I initially misunderstood how this worked. My concern was that new nodes might be forced into partitions where they did not really belong. But now i realise new nodes will only form part of an existing cluster if that is better than creating a new one.

This has been a really neat module.