Segment mapping start positions are computed as the midpoint of candidate mapping regions. The problem is that for small reference contigs (<2x segment size), the minmers/seed window boundaries are not uniformly distributed; they bunch up near the boundaries of the index. As a result, the start position for mappings which map to the end of these contigs can be offset.
An example of this issue is shown in #218. The contigs are all slightly less than 1kbp. If you set the segment length to 900bp and turn merging off (--no-merge) in the main branch, you'll see that the first split [0,900) maps okay, but the [100, 1000) split will map to ~500. The problem is mitigated by shifting the candidate window one minmer back.
Also, in cases where segments map towards the end of a reference contig, we should truncate the coordinates before doing the length mismatch filter.
Segment mapping start positions are computed as the midpoint of candidate mapping regions. The problem is that for small reference contigs (<2x segment size), the minmers/seed window boundaries are not uniformly distributed; they bunch up near the boundaries of the index. As a result, the start position for mappings which map to the end of these contigs can be offset.
An example of this issue is shown in #218. The contigs are all slightly less than 1kbp. If you set the segment length to 900bp and turn merging off (
--no-merge
) in the main branch, you'll see that the first split[0,900)
maps okay, but the[100, 1000)
split will map to ~500. The problem is mitigated by shifting the candidate window one minmer back.Also, in cases where segments map towards the end of a reference contig, we should truncate the coordinates before doing the length mismatch filter.