phac-nml / rebar

REcombination BARcode detector.
https://phac-nml.github.io/rebar/
Apache License 2.0
12 stars 1 forks source link

Efficiency improvements #1

Open ktmeaton opened 1 year ago

ktmeaton commented 1 year ago

Currently, recombination detection is slow at 5 seconds / sequences. Multiprocessing helps (--threads) but certainly there is code efficiency improvements needed.

ktmeaton commented 1 year ago

I think most of the slowdown is purely in passing large objects as parameters to genome_mp.

Because of this, I don't think my implementation of the analysis is the biggest problem.

ktmeaton commented 1 year ago

This might be a problem in which rust could solve.

wtchoga commented 1 year ago
image
ktmeaton commented 7 months ago

The code base rewrite to Rust (PR #5) was extremely helpful for this issue. On a single core, I've seen processing speeds ranging from 1-100 sequences/second. And that's not even taking --threads into account 😀

But I am leaving this unresolved until I benchmark with a larger dataset (ex. VirusSeq).