philres / ngmlr

NGMLR is a long-read mapper designed to align PacBio or Oxford Nanopore (standard and ultra-long) to a reference genome with a focus on reads that span structural variations
MIT License
284 stars 41 forks source link

Indel alignment #84

Open shimbalama opened 3 years ago

shimbalama commented 3 years ago

Hi Devs,

The IGV screen shot below shows the same deletion in short reads above (BWA-MEM) and ONT reads below (NGMLR). The 22bp deletion is strewn across ~50bp of the reference and has a varying length. I'm in the business of calling somatic variants, so this 'wobble' or 'fuziness' or whatever you want to call it makes this a difficult problem. I can't find any discussion of this problem in your docs so I just though I'd touch base and see if you could offer any solutions to mitigate this? This example is over a TAAAA repeat which always exacerbates the problem, however, I've looked at thousands of indels now and most have the same issue to some extend. Some of my larger indels wobble across more than 1k bps. And a similar issue with SV breakpoints (which I call using soft clipping).

Thanks, Liam igv_snapshot_chr22_26951324_indel_problem

fritzsedlazeck commented 3 years ago

Hey, sadly this is quite common in noisy data + these low repeats. I dont understand that the upper one is bwa -mem as the reads seem to be quite different and not including many sequencing errors.

Unfortunately, there is currently no approach to e.g. left align these deletions. Variant callers such as ours or others will be able to cope with this. Thanks Fritz

shimbalama commented 3 years ago

Thanks for your fast response, Fritz. Just FYI the reads at the top with BWA-MEM are Illumina 150bp reads.

fritzsedlazeck commented 3 years ago

Ah I got confused... sorry too much going on here. Yes please give e.g. Sniffles a try. I have implemented some procedures to make an educated guess where the breakpoint is most likely. Given the nature of this particular region, we could of course argue the whole day in which of the repeat units it occurs and I agree that is a remaining challenge.

Thanks Fritz

shimbalama commented 3 years ago

Thanks, Fritz