isovic / graphmap

GraphMap - A highly sensitive and accurate mapper for long, error-prone reads http://www.nature.com/ncomms/2016/160415/ncomms11307/full/ncomms11307.html Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/graphmap2
MIT License
178 stars 44 forks source link

How to change mapping parameters when dealing with highly repetitive region #82

Open ghost opened 6 years ago

ghost commented 6 years ago

We have done a genome assembly of a bacterium based on Oxford Nanopore data. The assembly looks very good overall, but I have one region where I want to confirm the assembly via mapping the longest 20% of the ONP reads back to the assembly.

The problem is that this region is very repetitive. I guess you could call it long tandem repeats of several hundred base pairs.

I think the assembly is correct, but graphmap is probably having trouble there. To me it looks like it is finding a seed on one particular repeat region of the refercence but the correct position would actually be shifted one/some repeat(s) left/right. Then the extension works but is "out of phase" in terms of repeats. When I have a look at the repeats on IGV I have insertions an deletions all about the same size (probably corresponding to the length of the tandem repeat).

The reason why I think that the assembly is correct is because I have about 30 to 40% of the long reads spanning the region correctly without any long indel.

How can I tweak the parameters of graphmap to avoid this problem?

Or do you think my hypothesis is wrong?