isovic / graphmap

GraphMap - A highly sensitive and accurate mapper for long, error-prone reads http://www.nature.com/ncomms/2016/160415/ncomms11307/full/ncomms11307.html Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/graphmap2
MIT License
178 stars 44 forks source link

mapping problems when mapping near a gap (dev version) #12

Closed JordyCoolen closed 8 years ago

JordyCoolen commented 8 years ago

Dear,

Im currently using the code and found mapping issues in multiple of the mapping modes. (anchor, myers, gotoh and anchor gotoh)

anchor

myers

gotoh

anchorgotoh

I will sent a email with some additional files. But in the attachment a igv snapshot of the results

igv_snapshot

Thank you, Jordy

isovic commented 8 years ago

Hi Jordy!

Thank you for reporting this. I have inspected the alignments and indeed found two bugs with anchored alignment. These wouldn't have occured under common conditions so I never spotted them, but your tests were great. They have been resolved now and merged with master branch. Would you mind giving it a shot now?

As for the Myers and Gotoh alignment modes (or, the semiglobal modes) - by design, they are not well suited for alignment in case of large structural variations. For this, anchored mode is the mode of choice. The reason is that in semiglobal alignment, a chunk of the reference slightly larger than the read is extracted, and if in that chunk a structural variant occured, the aligner might find a better alignment by shifting the starting position or introducing indels. Anchored mode on the other hand fixates the detected anchors. If a structural variant occured in the middle of two anchors, the alignment should go right through it, reporting a large indel event. If the event happened on an edge of the read, GraphMap would first attempt to align the entire read up to its end. With the new fix, GraphMap now checks if the edit distance of the front/back part is too high (higher than half of the leading/trailing lenght), which previously it didn't do, and that's why you observed a lot of mismatches over the N region in anchorgotoh mode.

Best regards, Ivan.

JordyCoolen commented 8 years ago

Hi Ivan,

Thank you for your quick response! I will give it a shot right away.

For the semiglobal modes, I was already thinking that it would not be suitable for this structural variation application but good to know that you confirm my thinking.

Thanks again!

Best regards, Jordy

JordyCoolen commented 8 years ago

Hi Ivan,

I tested it and indeed problems are solved. Thank you very much!

Best regards, Jordy

isovic commented 8 years ago

Great to hear that, thank you for the test!

Best regards, Ivan.