ksahlin / strobealign

Aligns short reads using dynamic seed size with strobemers
MIT License
128 stars 16 forks source link

Explore using Block Aligner for extension and global DP alignment #420

Open Daniel-Liu-c0deb0t opened 2 months ago

Daniel-Liu-c0deb0t commented 2 months ago

Hey! I briefly spoke with @ksahlin at RECOMB-seq about trying out Block Aligner. Block Aligner supports both global and extension (X-drop) affine gap alignment, so it should be able to replace both SSW and KSW2. It's also carefully optimized and supports AVX2, SSE, and NEON SIMD ISAs, so it is much faster (at least from my benchmarks). Certain parameters, like tuning the "block size" (similar to "band width" in other aligners) should be more forgiving due to its adaptive algorithm. I expect the speedup to be greater with longer reads.

(Un)fortunately, Block Aligner is written in Rust, although it has C bindings. I put together a simple example of how to use it here. This should be enough for a first pass at evaluating Block Aligner for strobealign. I can also help with reviewing the code, figuring out the best parameters, etc. If there's any small features missing vs SSW/KSW2, I can look into those if the evaluations look promising.

Daniel-Liu-c0deb0t commented 2 months ago

A different simpler evaluation would be to output the pairs of sequences to align from strobealign and plug them into Block Aligner's benchmarks to see if there's a speedup. In any case, here's the benchmarks (with traceback) from the paper (top plot is ~100bp Illumina reads). WFA2 is fast for Illumina pairs that are really similar in my benchmarks, but maybe that's not true in your tests (the rest of the aligners are not as sensitive to error rate for short reads). For longer reads, WFA2 adaptive is fast, but much less accurate.

Screen Shot 2024-04-28 at 3 04 26 PM
marcelm commented 1 month ago

This looks great! I’m looking forward to testing this. I’m out of office next week, so this may need to wait a little bit, though.