vtraag / leidenalg

Implementation of the Leiden algorithm for various quality functions to be used with igraph in Python.
GNU General Public License v3.0
596 stars 78 forks source link

ValueError: vector::reserve #79

Closed ilpizzo84 closed 2 years ago

ilpizzo84 commented 3 years ago

I obtain the following error when executing find_partition_temporal with CPMVertexPartition:

~/anaconda3/lib/python3.8/site-packages/leidenalg/functions.py in find_partition_temporal(graphs, partition_type, interslice_weight, slice_attr, vertex_id_attr, edge_type_attr, weight_attr, n_iterations, max_comm_size, seed, **kwargs)
    289     optimiser.set_rng_seed(seed)
    290 
--> 291   improvement = optimiser.optimise_partition_multiplex(partitions + [partition_interslice], n_iterations=n_iterations)
    292 
    293   # Transform results back into original form.

~/anaconda3/lib/python3.8/site-packages/leidenalg/Optimiser.py in optimise_partition_multiplex(self, partitions, layer_weights, n_iterations, is_membership_fixed)
    392     continue_iteration = itr < n_iterations or n_iterations < 0
    393     while continue_iteration:
--> 394       diff_inc = _c_leiden._Optimiser_optimise_partition_multiplex(
    395         self._optimiser,
    396         [partition._partition for partition in partitions],

ValueError: vector::reserve
vtraag commented 3 years ago

That clearly should not happen. Could you please provide a minimal working example to reproduce the problem so that I can debug the problem?

ilpizzo84 commented 3 years ago

This bug is quite mysterious to me. I was experimenting with temporal clustering on 29 snapshots by varying the resolution parameter. I started with 0.001, and it worked properly. Then with 0.01, and it worked properly again. Finally, with 0.1 and I obtained this error. From that point on, it never worked again, also trying with previously used values. I did not change anything in the code nor in the data, so I cannot figure out what it can be. By the way, if I use fewer snapshots it works properly, so the problem seems to be strongly data-dependent, and unfortunately, I cannot share the dataset :(

I call the function in this way:

slice_membership, improvement = la.find_partition_temporal(slices,
                                                               la.CPMVertexPartition,
                                                               vertex_id_attr="_nx_name",
                                                               interslice_weight=1.0,
                                                               resolution_parameter=0.01,
                                                               n_iterations=5,
                                                               seed=0)
ilpizzo84 commented 3 years ago

This is my environment:

Please let me know if you need more details.

ilpizzo84 commented 3 years ago

New update: as I said, when I obtain that strange error, I will find it on the same data also by using the parameters already adopted for successful experiments. If I uninstall and install again the leidenalg library with pip, I still get the error for configurations that previously performed correctly. But, if I reinstall the library with pip using the option --no-cache-dir, then I can replicate those successful experiments with previously used configurations. Hence, I guess that when I obtain the ValueError: vector::reserve , something is cached by the library preventing its correct functioning.

vtraag commented 3 years ago

Hmm, that sounds quite mysterious indeed! I will try to look into what could go wrong. There is some caching going on, but that should in principle not carry over between different runs.

vtraag commented 3 years ago

To be sure I understand the problem, you keep on hitting this error, even if you restart a completely new Python session and run everything from the beginning? Only if you uninstall and reinstall the error disappears?

From the error report it seems you are running Anaconda python. Is there any reason why you are not using the package from conda-forge and use the pip package instead?

malkamont commented 3 years ago

Hello,

I keep on getting the same error message when looping optimiser.optimise_partition_multiplex(partitions+[interslice_partition], n_iterations=10) 500 times. I suspect the bug is related to this particular function.

The error is not returned after every single run of the algorithm, and sometimes when I let the seed vary randomly my loop passes through. Opening a new python session and/or reinstalling the package (neither 0.8.4 nor 0.8.7) doesn't make a difference in my case.

~/.conda/envs/la/lib/python3.9/site-packages/leidenalg/Optimiser.py in optimise_partition_multiplex(self, partitions, layer_weights, n_iterations, is_membership_fixed)
    392     continue_iteration = itr < n_iterations or n_iterations < 0
    393     while continue_iteration:
--> 394       diff_inc = _c_leiden._Optimiser_optimise_partition_multiplex(
    395         self._optimiser,
    396         [partition._partition for partition in partitions],

ValueError: vector::reserve

I'm using the latest version of the package:

Name: leidenalg Version: 0.8.7 Build: py39he80948d_0 Channel: conda-forge

Best, Arttu

vtraag commented 3 years ago

Thanks for reporting. Would you be able to share the dataset and provide reproducible code so that I can reproduce the problem locally?

malkamont commented 3 years ago

I'm using a set of 12 networks and a coupling graph.

Everything goes smoothly until the run with optimiser.set_rng_seed(225).

I just added you @vtraag to the private repo from where'll you'll get the data as graphMLs.

layers=[ig.Graph.Read_GraphML("layers0.xml"),
    ig.Graph.Read_GraphML("layers1.xml"),
    ig.Graph.Read_GraphML("layers2.xml"),
    ig.Graph.Read_GraphML("layers3.xml"),
    ig.Graph.Read_GraphML("layers4.xml"),
    ig.Graph.Read_GraphML("layers5.xml"),
    ig.Graph.Read_GraphML("layers6.xml"),
    ig.Graph.Read_GraphML("layers7.xml"),
    ig.Graph.Read_GraphML("layers8.xml"),
    ig.Graph.Read_GraphML("layers9.xml"),
    ig.Graph.Read_GraphML("layers10.xml"),
    ig.Graph.Read_GraphML("layers11.xml")]

interslice_layer=ig.Graph.Read_GraphML("interslice_layer.xml")

for i in range(len(layers)):
    layers[i].vs["node_size"]=[int(item) for item in layers[i].vs["node_size"]] #Graph I/0 apparently changed the value type

interslice_layer.vs["node_size"]=[int(item) for item in interslice_layer.vs["node_size"]]

def detector():
    l0=la.RBConfigurationVertexPartition(layers[0], node_sizes="node_size", weights=None, resolution_parameter=0.8);
    l1=la.RBConfigurationVertexPartition(layers[1], node_sizes="node_size", weights=None, resolution_parameter=0.8);
    l2=la.RBConfigurationVertexPartition(layers[2], node_sizes="node_size", weights=None, resolution_parameter=0.8);
    l3=la.RBConfigurationVertexPartition(layers[3], node_sizes="node_size", weights=None, resolution_parameter=0.8);
    l4=la.RBConfigurationVertexPartition(layers[4], node_sizes="node_size", weights=None, resolution_parameter=0.8);
    l5=la.RBConfigurationVertexPartition(layers[5], node_sizes="node_size", weights=None, resolution_parameter=0.8);
    l6=la.RBConfigurationVertexPartition(layers[6], node_sizes="node_size", weights=None, resolution_parameter=0.8);    
    l7=la.RBConfigurationVertexPartition(layers[7], node_sizes="node_size", weights=None, resolution_parameter=0.8);
    l8=la.RBConfigurationVertexPartition(layers[8], node_sizes="node_size", weights=None, resolution_parameter=0.8);
    l9=la.RBConfigurationVertexPartition(layers[9], node_sizes="node_size", weights=None, resolution_parameter=0.8);
    l10=la.RBConfigurationVertexPartition(layers[10], node_sizes="node_size", weights=None, resolution_parameter=0.8);
    l11=la.RBConfigurationVertexPartition(layers[11], node_sizes="node_size", weights=None, resolution_parameter=0.8);
    partitions=[l0, l1, l2, l3, l4, l5, l6, l7, l8, l9, l10, l11];
    interslice_partition=la.RBConfigurationVertexPartition(interslice_layer, node_sizes="node_size", weights="weight", resolution_parameter=0);
    diff=optimiser.optimise_partition_multiplex(partitions+[interslice_partition], n_iterations=10);
    partitions=partitions[0];
    return partitions

detections=[]
seeds=np.arange(500)

for i in seeds:
    optimiser.set_rng_seed(i)
    partitions=detector()
    print("Run with seed", i, "completed")
    detections.append(partitions)
vtraag commented 3 years ago

Great, thanks! I'll try to take a look in the coming days.

vtraag commented 3 years ago

I believe I have identified and solved the problem, it should be fixed in #82. @malkamont and @ilpizzo84, could you please try the code in #82 and confirm it solves your problem? You should be able to install it by simply cloning it locally and run python setup.py install. If you run into any problems, let me know.

vtraag commented 3 years ago

FYI, you can also get the binary wheels from the GitHub Action (see Artifacts), that should make it easier to install @malkamont and @ilpizzo84. If you need help, let me know.

malkamont commented 3 years ago

There were some troubles in cloning the repo and using python setup.py install, apparently due to some incompatibility with igraph, but the wheel solution worked fine. The bug is fixed and my code runs smoothly - thank you so much!

vtraag commented 3 years ago

Great, thanks for the feedback and good to hear! I will try to make a new release shortly.

Compiling the package from source may indeed be more complicated, so no worries if that doesn't work for you.