neo4j / graph-data-science

Source code for the Neo4j Graph Data Science library of graph algorithms.
https://neo4j.com/docs/graph-data-science/current/
Other
596 stars 157 forks source link

Set random seed in Louvain/(any stochastic) algorithms #225

Closed fanavarro closed 1 year ago

fanavarro commented 1 year ago

Is your feature request related to a problem? Please describe.

I am using the Louvain algorithm to discover communities in my graph; however, each execution of the algorithm returns different results. As I am writing a scientific article, it is very important that my results are reproducible. I have not seen any way to stablish a random seed when executing the algorithm so that it would return the same result in every call. Is there any way to set this?

Describe the solution you would like

If not, I would like to have a parameter to set a random seed to be used by the Louvain algorithm in particular, and by any stochastic algorithm in general.

Other information Related issue: #221

Thanks for your time, and kind regards. Fran.

IoannisPanagiotas commented 1 year ago

Hello Fran,

Thank you for your interest in the GDS library. Actually the feature you are looking for already exists within GDS, as we have a 'randomSeed' available parameter for any randomized algorithms.

In this case, however, your issues are probably not related to the lack of this parameter in Louvain as it is not randomised.

Your issue sounds like it's related to concurrency. Parallelism leads to different flows of execution each time the code is run. Therefore during the local move phase of an iteration, nodes might make different decisions and form different communities.

Once you have projected your graph, running Louvain with concurrency equal to one should normally give you the same result. If not, you could let us know and we can look into it.

You should note however that gds.graph.project might order the nodes differently execution from execution which can possibly also affect decisions in Louvain so if you need to run "gds.graph.project + gds.louvain" every time it might be problematic.

For this, setting concurrency for gds.graph.project to one might help, but it may still be the case the the ordering differs.

Please let us know if you have any other questions or issues.

Best regards, Ioannis.

fanavarro commented 1 year ago

Hi @IoannisPanagiotas, thanks for your answer!

Effectively, now, I am getting the same results from the Louvain algorithm by setting the concurrency to 1. I've tried to make my example as reproducible as possible by setting the concurrency to 1 when I create the projected graphs as well. Nonetheless, I wonder if would it be possible to keep the stochastic nature of the algorithm while using concurrency because that "random" behavior could be confusing. This is only a wish, feel free to close this issue as we have found a way to address it!

Thanks again, Fran.

IoannisPanagiotas commented 1 year ago

Hello again,

Great to hear that you can get some deterministic behaviour now!

It seems to me that would be hard to fix determinism for concurrency in approximation algorithms like Louvain without makes compromises to performance. We will nonetheless try to consider the situation and see if anything can be done.

I will close this now. Do not hesitate to come back if you have any more questions !