isovic / graphmap

GraphMap - A highly sensitive and accurate mapper for long, error-prone reads http://www.nature.com/ncomms/2016/160415/ncomms11307/full/ncomms11307.html Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/graphmap2
MIT License
178 stars 44 forks source link

alignment out of reference bounds #14

Closed andreas-wilm closed 7 years ago

andreas-wilm commented 8 years ago

Hi Ivan,

I have a weird case where a read gets aligned to reference end +1. This is a long, stitched-together Illumina read mapped against a database of many similar sequences . I used the following command line (see PS for links to files):

graphmap -x illumina -t 8 -r 99_otus.fasta -d offending_seq.fa

The alignment in question is: 2|EU668175.1.1895 0 159627 317 0 122M1I539M1I3M111I2M2I1M4I1M1I1M2I2M1I1M2I1M1I3M1I1M1I4M4I10M2I3M2I5M1I10M1I13M2I3M1I6M2I3M2I7M4I3M1I3M4I1M1I2M4I4M4I13M1I3M2I7M2I4M1I6M2I8M1I2M1I16M3I1M2I10M1I7M1I5M3I2M1I4M1I1M1I1M3I3M1I7M1I2M1I2M1I1M2I6M1I1M6I1M1I1M1I6M5I1M1I4M1I2M2I1M2I1M1I2M2I11M6I8M1I2M1I8M2I8M2I5M2I1M2I10M5I1M3I8M8I2M2I5M7I2M1I4M1I10M2I1M1I1M2I6M2I3M2I3M2I7M1I1M1I6M2I2M1I5M1I15M2I14M2I8M2I4M2I2M2I1M1I7M

The cigar string translates into a length of 1056. Start position 317 + 1056 gives 1373, but the reference is of length 1372. You get the same result with e.g. pysam's aligned_pairs. I get this for 0.2.2 604a386 (dev) and 0.22 db1362c (master). Weirdly enough this doesn't happen if I simply extract the reference of interest from the bigger database and only align against that.

Andreas

PS: https://dl.dropboxusercontent.com/u/4119940/graphmap-out-of-bounds-aln/offending_seq.fa https://dl.dropboxusercontent.com/u/4119940/graphmap-out-of-bounds-aln/99_otus.fasta.gz

isovic commented 8 years ago

Hmm a Heisenbug. Thanks for the data and the report, I'll check it out!

isovic commented 7 years ago

Hi Andreas, this should now be fixed. Thanks! Best regards, Ivan.