odelaneau / shapeit5

Segmented HAPlotype Estimation and Imputation Tool
https://odelaneau.github.io/shapeit5/
MIT License
61 stars 9 forks source link

Question about reproducibility #25

Closed koido closed 1 year ago

koido commented 1 year ago

Hi,

Thanks for developing such an excellent tool.

I noticed that if I want to make the phase_common results reproducible, I have to use a single thread (and keep the seed, although the default seed is fixed). This was described in the SHAPEIT4's document as follows:

  1. Reproducibility Making reproducible runs can sometimes be useful. To do so, you need to specify the random generator seed using --seed and to use a single thread. Using multi-threading prevents reproducibility.

Regarding ligate and phase_rare, should I use a single thread to make it reproducible?

Although I want to use multiple threads to speed up, I am also concerned about reproducibility and its potential effects on my downstream analysis.

Best,

odelaneau commented 1 year ago

Hi Masuru,

I think ligate and phase_rare are safe, there's no random sampling in there. Multi-threading and reproducibility should be possible.

But as you mention; this is not the case for phase_common.

Cheers;

Olivier Delaneau

http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail Sans virus.www.avg.com http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Le ven. 21 avr. 2023 à 06:13, Masaru Koido @.***> a écrit :

Hi,

Thanks for developing such an excellent tool.

I noticed that if I want to make the phase_common results reproducible, I have to use a single thread (and keep the seed, although the default seed is fixed). This was described in the SHAPEIT4's document as follows:

  1. Reproducibility Making reproducible runs can sometimes be useful. To do so, you need to specify the random generator seed using --seed and to use a single thread. Using multi-threading prevents reproducibility.

Regarding ligate and phase_rare, should I use a single thread to make it reproducible?

Although I want to use multiple threads to speed up, I am also concerned about reproducibility and its potential effects on my downstream analysis.

Best,

— Reply to this email directly, view it on GitHub https://github.com/odelaneau/shapeit5/issues/25, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD4XTINW67XJAHRLT5FL22DXCICQHANCNFSM6AAAAAAXGK6TQU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

koido commented 1 year ago

Hi Olivier,

Thank you very much for your helpful response.

Best, Masaru