vtraag / leidenalg

Implementation of the Leiden algorithm for various quality functions to be used with igraph in Python.
GNU General Public License v3.0
596 stars 78 forks source link

Isolates in slices_to_layers? #84

Closed malkamont closed 2 years ago

malkamont commented 2 years ago

Hello,

I'm using leidenalg to optimise a multilayer partition. I have nine layers and the same 100 nodes in every layer, yet quite a few nodes are isolates in some of the layers. I set up a coupling graph and pull everything through slices_to_layers. However, when running optimise_partition_multiplex, the results make me think that everything is not functioning properly:

At first sight, I receive communities that couple neatly across layers. But after adjusting the coupling weights from high (4) to extremely low (1/billion), the communities still remain the same and keep on coupling tightly across all layers. I have run the algorithm successfully many times without the layer-specific isolates, where relatively small adjustments to the coupling weights are usually immediately visible in the results - could it be that the optimiser and/or quality function (tested both CPM and RBC) is not handling the isolates correctly? I would expect the isolates to simply 'filter' influence through from the other layers.

Setting the coupling weights to zero makes everything work as expected. This suggests that there must be a 'transition' from 0% coupling to 100% coupling somewhere. I just wanted to ask if there's a reason that would explain this behaviour and to get a confirmation whether to keep the isolates or not.

Many thanks, Arttu

vtraag commented 2 years ago

I'm not completely sure I understand the question, I'm sorry. It's a bit difficult to understand without a specific example. Could you perhaps post some data and code that reproduces your results, so that it's a bit clearer what the problem is?

In principle, the intercoupling weight can indeed show a big effect exactly when setting it from zero to any non-zero weight. For example, if there are essentially identical communities across different slices (which are properly detected by some quality function), then any non-zero weight should make the results identical across the slices, while a zero weight results in the communities being different across the slices. I have the impression that this is not the case for your question, but as said, I don't have a good understanding of your case yet.

In particular, one thing that is not clear to me is what you mean with "isolates". Are these nodes that are otherwise not connected to any other nodes in the same slice? Or do you mean these are nodes that are in singleton communities, i.e. the only member of a particular community?

malkamont commented 2 years ago

Hi,

I'm sorry for the lack of clarity, the issue is indeed a bit challenging to explain in full. I can't share the data directly, unfortunately, but maybe looking at the full graph as returned by slices_to_layers helps in rephrasing my question:

image

There are three types of relationships ("y"-axis) over three time points ("x"-axis). There are several nodes that are otherwise not connected to any other nodes in the same slice - all the smallest nodes in the figure. Plus, there's one slice (top-middle) that only contains such nodes (that is, missing data).

When I test optimise_partition_multiplex with different sets of coupling weights and compare the resuls, I notice that even with very weak couplings the communities that form tend to couple very strongly across all the nine layers:

image

If I drop the nodes that are not otherwise connected to other nodes from all slices (thus dropping also the 'missing' slice), the communities couple in this way only with rather strong coupling parameters.

It is possible that this issue is unrelated to the Leidenalg as such, and has more to do with Mucha et al. 2010, but I'd very much appreciate your expertise in understanding what might be causing this behaviour - how would one expect the disconnected nodes in every slice to influence the resulting multi-slice partition?

I was first thinking that this would be due to something happening to the resolution parameter or to changes in layer weights (fixed at 1), but I'm not sure.

All help and guidance are much appreciated, Arttu

vtraag commented 2 years ago

Thanks for the additional explanation. I have a somewhat better understanding of your question, although I am not 100% sure I fully grasp it. I'll try to answer, in the hopes that I understand it sufficiently well.

First of all: if you have missing nodes, I would not include them in the graph. If you do include them in the graph, this now essentially amounts to saying that they are not connected to any of the other nodes in that particular slice. Presumably, since the data is missing, you don't know whether they are connected or not.

I am not 100% sure if I understand the coupling correctly. You have nine slices in total across three different categories and three different times. By the looks of it, you are coupling every category among each other (i.e. categories 1 and 2, 2 and 3, and 3 and 1 are coupled), and coupling only sequential times (i.e. times 1 and 2, and 2 and 3)? This looks reasonable to me.

The question now is how you couple things when you remove the missing nodes. You might need to think what you would do if you remove the missing nodes. Should a node from time 1 be coupled to a node from time 3 if there is no such node in time 2? Or is it fine to not have them coupled in any way then?

If they are not coupled in any way (which is what I suspect), then especially for the top middle slice, there will be many fewer coupling links. This implies there will be less gain in putting nodes in the same community across the two remaining top slices, except for their coupling to the other two categories. You hence need a stronger coupling to resemble the situation in which the missing nodes are present.

It is possible that this issue is unrelated to the Leidenalg as such, and has more to do with Mucha et al. 2010

In a sense, yes, the algorithm performs as expected I believe. Whether the results are of interest to you is a separate question.

I was first thinking that this would be due to something happening to the resolution parameter

Yes, it might be related to the resolution parameter. Let's consider the case where you only include the top three slices, and forget about the other categories. For the top three slices, there is a benefit to putting nodes in the same community for time 1 and time 3. For time 2, there is no such benefit, since the nodes are all disconnected. The only reason that nodes in time 2 are the same as in time 1 and time 3 is that they are coupled. That is, the benefit of putting nodes in time 2 in the same communities as in time 1 and 3 is outweighed by the fact that they are not connected among each other. Hence, if you increase the resolution for the slice for time 2, or decrease the coupling strength sufficiently, you should see that the time slices will no longer retain the same community structure.

In general, it might be an idea to play around with the temporal and categorical dimensions separately to gain a better understand of the effects.

malkamont commented 2 years ago

Many thanks for your detailed response - very helpful! I had a hunch that the answer would align along these lines, but it helps me a lot to get a confirmation. I'll proceed as suggested.

Best, Arttu