prophyle / prophasm2

MIT License
0 stars 0 forks source link

Parallelism #10

Closed OndrejSladky closed 6 months ago

OndrejSladky commented 6 months ago

Currently blocked by #9

simultaneous reading from multiple files

Done. Also simultaneous removal of intersection and simultaneous assembling.

parallelized reading of individual files (eg individual sequences) Not done. My guess is that in our case the gains will not be particularly high as unlike in kmer-cnt we do not care only about the frequencies but also about individual k-mers, so we still need to merge the tables afterwards (which I believe will strip off most of the benefits).

Hence now parallelism makes no sense for single set. We might want to improve upon this in the future, but right now I don't believe it has good work/gain ratio.

karel-brinda commented 6 months ago

Great, thanks for the update! I'll merge it.

Hence now parallelism makes no sense for single set. We might want to improve upon this in the future, but right now I don't believe it has good work/gain ratio.

THis we will need to base upon benchmarking data I believe. Usually, the slowest part is k-mer hashing. A typical use case where ProphAsm currently fails, is computing simplitigs from thousands of very similar genomes. Here, parallelism should help a lot.