Open Ni-Ar opened 2 years ago
Yes, the iGraph algorithm does involve some randomness - see the Python docstring at https://github.com/igraph/python-igraph/blob/950d61a1c4dec3d0793c3b5327f154d64009f536/src/igraph/__init__.py#L1587 and the underlying C function at https://github.com/igraph/igraph/blob/6798f825df7712f1351aa7ec1a6e56ecdf1bde26/src/community/leiden.c#L906. Unfortunately it doesn't provide an option to specify the random seed, so some slight variation from run to run will be expected. The NetworkX algorithm is (to my knowledge) deterministic, albeit quite a bit slower due to being pure Python.
Hi @tristanic,
thanks a lot for explaining. Probably the easiest way to control for this is using both iGraph and NetworkX algorithms and use the overlap. Have you tried that? Does that work in your opinion?
Thanks, Nicco
Could do. To be honest I think there's quite a lot more that could be done here, but my original intent for this was as a quick proof-of-principle that others could pick up and develop further (it's interesting, but a bit tangential to my day-to-day work). So feel free to explore!
This is also tangential to my day-to-day work :D I really like the idea, but yeah bit more exploration is needed on my side for what I'd like it to use. Thanks for sharing this. Let's see how much exploration time I can dedicate to this :)
Hi guys, it's simple to set a seed for the script to ensure same run has same outputs, just add following lines before the clustering step import random random.seed(1234)
Have you tried that or are you just guessing it would make the results reproducible?
Have you tried that or are you just guessing it would make the results reproducible?
Yeah, I got the same results, you can have a try.
if lib=='igraph':
f = domains_from_pae_matrix_igraph
else:
f = domains_from_pae_matrix_networkx
import random
random.seed(1234)
clusters = f(pae, pae_power=args.pae_power, pae_cutoff=args.pae_cutoff, graph_resolution=args.resolution)
Thanks! I'll give it a try
Hi,
I was playing around with the different threshold parameters and realised that the same run can yield different number of clusters.
Is this something you also observed on the same input pae json file? I suspect this comes from the igraph
community_leiden
step, but I might be wrong.