mikolmogorov / Flye

De novo assembler for single molecule sequencing reads using repeat graphs
Other
784 stars 168 forks source link

`--deterministic` flag should set a random seed for random processes #640

Open kenibrewer opened 1 year ago

kenibrewer commented 1 year ago

As discussed in #509 in this comment, Flye still produces different assembly outputs when using the --deterministic flag. That is because multi-threading is not the only source of non-determinism. There is also the use of random variables in the process here.

An improved --deterministic flag should set a random seed for these other processes to allow the entire assembly process to occur deterministically.

A fully deterministic Flye would be very helpful for integrating Flye into bioinformatic pipelines that are tested against reference datasets.

mikolmogorov commented 1 year ago

Makes sense, will add this to the list of TODOs for the next release, thanks!

riyasj327 commented 2 weeks ago

@mikolmogorov thank you so much for this amazing tool! Just wondering if this is fixed? As of now, using --deterministic will give the best reproducible assembly?

mikolmogorov commented 1 week ago

Not at the moment, unfortunately. True determinism is something very hard to achieve and maintain in assembler, since the algorithm is complex and involves multiple completely different stages. Every stage have to be completely deterministic to make the pipeline deterministic. Given that our resources are limited, not a priority right now.