rapidsai / cugraph

cuGraph - RAPIDS Graph Analytics Library
https://docs.rapids.ai/api/cugraph/stable/
Apache License 2.0
1.67k stars 299 forks source link

[QST]: Leiden clustering before and after 23.02 #4529

Open wdnlotm opened 1 month ago

wdnlotm commented 1 month ago

What is your question?

Hi, I use the Leiden clustering a lot. Louvain as well, sometimes. I am experiencing big differences in the results when I use 23.02 and newer versions. My dataset has 80 mil edges and 6 mil nodes. The biggest notable difference is older than 23.02, the Leiden clustering produces about 20 clusters, but newer than 23.02 produces, sometimes, 200K clusters for the same dataset and settings. Now I need to use Dask cugraph but this big difference prevents me from moving on to the new versions. Any idea? Thank you.

Code of Conduct

alexbarghi-nv commented 1 month ago

23.02 is a fairly old version of cuGraph. I would start by moving to a newer version of cuGraph (the latest is 24.06) and seeing if that resolves your issues.

Also @ChuckHastings I think we've had a similar issue reported in the past? What was the resolution there?

ChuckHastings commented 1 month ago

Our original implementation of Leiden had a number of functionality issues (it was not much more than Louvain in actuality) and had significant scaling issues. Our implementation of Leiden was completely rewritten as of version 23.04, so you have probably observed that major shift. This version was much better, although there continued to be some minor issues with it through earlier this year. As of 24.06 we have no known outstanding issues with Leiden. So I would suggest using 24.06 or later.

If you have examples where our Leiden implementation in 24.06 generates answers that you are concerned about, please provide some details (ideally a sample data set, parameters and what you might expect the result to be) and we will be happy to investigate.

wdnlotm commented 1 month ago

I am testing a few versions. I have an edge list from a knn (82 mil edges, 5.5 mil vertices). I load it to G=cugraph.MultiGraph(directed=False). I use leiden or louvain by running parts, modularity_score = cugraph.louvain(G, max_level=500, resolution = 0.65) or parts, modularity_score = cugraph.leiden(G, max_iter=500, resolution = 0.65, random_state=123)

I use the Apptainer by the way. 24.06 Leiden produced 1741523 clusters 24.06 Louvain produced 11 (eleven) clusters

24.02 Leiden produced 10 clusters. Great!! But if I rerun the same code, I got 11 clusters. It's good too but not consistent. After one more run, I got 141 clusters. 24.02 Louvain produced 11 (eleven) clusters

23.02 Leiden produced 19 clusters. 23.02 Louvain produced 20 clusters

In fact, I like the results from 23.02. Those results are comparable to what I could get from the leidenalg library which is s l o w.

Thanks.

ChuckHastings commented 1 month ago

Leiden circa 23.02 was really Louvain with just a tiny bit of extra logic. It didn't actually implement much of the Leiden algorithm at all.

The big issue with 23.06 through 24.02 was inconsistencies. We identified and corrected several bugs that made things more consistent. Because we are operating in parallel, and Louvain/Leiden are greedy approximation algorithms, we can still be affected by numerical instability. Sums of values occur in parallel in a non-deterministic order and we can see minor variations in the results on larger graphs. We have seen several cases where the best choice in the greedy algorithm is really a tie, and due to the variability in the numerical stability we might see different vertices in the tie actually be a clear winner. This is a known source of variability in the results that we haven't tried to address. But we have addressed many of the inconsistencies.

Producing 1.7 million clusters seems like a bug of some kind. Is the graph you are using something you can share? I can try to find comparable size graphs and see if I can get this type of behavior, but starting from your graph will be the easiest path.

wdnlotm commented 1 month ago

I can share my graph and code. Give me an hour or two. Thanks.

wdnlotm commented 1 month ago

I put files here. Thank you for your help!! https://drive.google.com/drive/folders/10lziXlXut-ZGcXDlwbk6SxqPB9OIunSS?usp=sharing

ChuckHastings commented 1 month ago

It may take some time to investigate, but we will keep you informed on progress.

ChuckHastings commented 1 month ago

Just added this to our development plan for the 24.10 release. We should at least get some analysis done soon.

wdnlotm commented 2 weeks ago

If you don't mind, I can add another edge list, much smaller, showing similar results. https://drive.google.com/drive/folders/10lziXlXut-ZGcXDlwbk6SxqPB9OIunSS?usp=sharing go into smaller_knn

Results using 24.06 were like with Leiden 35387 clusters with Louvain 15 clusters.

Thanks.

ChuckHastings commented 2 weeks ago

Smaller graphs are always easier to debug. Thanks!