ksahlin / strobealign

Aligns short reads using dynamic seed size with strobemers
MIT License
128 stars 16 forks source link

Try extension alignment if seed appears to be ungapped #377

Open marcelm opened 6 months ago

marcelm commented 6 months ago

Btw, what does strobealign currently do when the region with the NAM has the same length on the query and the reference and Hamming distance of the NAM region is high?

Do we fully realign such cases with SSW? If so, an optimization would be to run ksw on the ends only. I remember we have discussed similar scenarios when we tried out partitioning the alignments and use WFA2 but I don't remember the conclusions.

Clarification: I meant when Hamming distance is high -- possibly because regions outside the NAM region do not fit (e.g. indels). Then it might be inefficient to realign the whole read. One approach would be to try hamming of the NAM hit only, then extension of the ends.

Originally posted by @ksahlin in https://github.com/ksahlin/strobealign/issues/357#issuecomment-1834197665

marcelm commented 6 months ago

Yes, we fully align. The only case in which we do not fully align is when the NAM on the query and the reference have the same length and the hamming distance is low (<5% differences).