Open marcelm opened 1 year ago
When judging whether seeds (not reads) are proper pairs, the above doesn’t quite work because we don’t know what the leftmost or rightmost mapped bases are going to be, mainly because we don’t know how many bases are going to be soft clipped on either side.
In a situation like this (====
show seed locations), the seeds would not overlap at all, but the reads could:
R1 -------------------->
====
R2 <------------------------
====
To ensure we don’t mistakenly rule out a pair, we can assume that the alignment extends ungapped to either end of the read. (I believe this is already done for the 5' end at the moment.)
I agree your proposed solutions.
As has come up in #317 reported by @y9c, when mapping paired-end reads, we should allow for the case that reads overlap in this way:
Strobealign currently allows these two situations:
R1---> <---R2
)R2---> <---R1
)(#317 changes the above to "... is less than or equal to ...")
It appears to me that to, for the first situation, we just need to change this to "... leftmost mapped base of R1 is less than the rightmost mapped base of R2" and similar for the second situation.