prophyle / prophasm2

MIT License
0 stars 0 forks source link

Feature request: Parallelism #4

Closed karel-brinda closed 6 months ago

karel-brinda commented 8 months ago

The -t {threads} parameter

PavelVesely commented 8 months ago

Parallelizing ProPhasm, local greedy or global greedy will be quite challenging IMHO and definitely require a lot of care. The program may also need more memory. This is because when extending several local paths (simplitigs or pseudosimplitigs) in parallel, one needs to lock individual k-mers so that no two threads will add any k-mer at once.

Likewise, global greedy in the hash-table implementation may be parallelized by searching for length-d overlaps in parallel, again using locking on individual k-mers that are merged.

karel-brinda commented 8 months ago

This is actually an interesting question.

I think we don't need to parallelize the actual computation of simplitigs (it's quite fast at its own I believe), but other levels:

  1. simultaneous reading from multiple files
  2. parallelized reading of individual files (eg individual sequences)

Some relevant experiments regarding parallelization and optimized reading were done here: https://github.com/lh3/kmer-cnt