jacoblevine / PhenoGraph

Subpopulation detection in high-dimensional single-cell data
http://www.c2b2.columbia.edu/danapeerlab/html/phenograph.html
MIT License
134 stars 69 forks source link

Reproducibility problem (cluster number ) #16

Open sinnamone opened 5 years ago

sinnamone commented 5 years ago

Hi,

I ran Phenograph 4 times using the same input matrix and I have noted that the results is different in term of output number of cluster.

How I can set the parameters for reproduce (or modify the seed) the clustering results of Phenograph (or to have at list a similar results)?

SamGG commented 5 years ago

I think your question relates to setting the random seed in order to get the results although the computation is stochastic. There is an issue related to this https://github.com/jacoblevine/PhenoGraph/issues/14. The Louvain code is in C so it is useless to set a random seed in the Python code calling the Louvain implementation. The seed must be pass to the C code. If you have programming skill you could try to implement it. What you observe so far is natural and still inform you about the robustness/power of your data in order to aggregate themselves in clusters. Best

DenisSch commented 5 years ago

We have implemented a random seed selection in the most recent histoCAT version. Here is the detailed commit: https://github.com/BodenmillerGroup/histoCAT/commit/8d74a318764e83408b648a61465c5140749ca835

Otherwise, feel free to use the histoCAT Phenograph version.

Best

Denis

sinnamone commented 5 years ago

Dear @SamGG and @DenisSch thank you for the suggestion,

I was able to fix seed using the file "community" present in histoCAT tool, parsing core.py and install again phenograph. I tested these changes first on my mac with good results, now I'm testing the same changes on the unix server but I noticed that the file in histoCAT/histoCAT/3rdParty/PhenoGraph/Louvain_d/community doesn't have the -s flag.

I wonder if it is possible to fix the seed also on linux server or only on mac operating systems?

Best regards,

Simone

DenisSch commented 5 years ago

I have create a new issue for that on the histoCAT repository. I will do my best to include it into the next update.

levrex commented 2 years ago

I ran into the same problem.

Here is a link to a fork of the Phenograph repo (DeterministicPhenoGraph), where you can set a seed for the Louvain. This should work on all platforms and allows you to get reproducible results.