ksahlin / strobealign

Aligns short reads using dynamic seed size with strobemers
MIT License
151 stars 17 forks source link

Map paired-end reads that overlap each other’s 5' ends #320

Open marcelm opened 1 year ago

marcelm commented 1 year ago

As has come up in #317 reported by @y9c, when mapping paired-end reads, we should allow for the case that reads overlap in this way:

R1         -------->
R2   <--------

Strobealign currently allows these two situations:

  1. R1 is forward, R2 is reverse, and the leftmost mapped base of R1 is less than the leftmost mapped base of R2 (R1---> <---R2)
  2. R2 is forward, R1 is reverse and the leftmost mapped base of R2 is less than the leftmost mapped base of R1 (R2---> <---R1)

(#317 changes the above to "... is less than or equal to ...")

It appears to me that to, for the first situation, we just need to change this to "... leftmost mapped base of R1 is less than the rightmost mapped base of R2" and similar for the second situation.

marcelm commented 1 year ago

When judging whether seeds (not reads) are proper pairs, the above doesn’t quite work because we don’t know what the leftmost or rightmost mapped bases are going to be, mainly because we don’t know how many bases are going to be soft clipped on either side.

In a situation like this (==== show seed locations), the seeds would not overlap at all, but the reads could:

R1                      -------------------->
                                   ====
R2     <------------------------
            ====

To ensure we don’t mistakenly rule out a pair, we can assume that the alignment extends ungapped to either end of the read. (I believe this is already done for the 5' end at the moment.)

ksahlin commented 1 year ago

I agree your proposed solutions.