philres / ngmlr

NGMLR is a long-read mapper designed to align PacBio or Oxford Nanopore (standard and ultra-long) to a reference genome with a focus on reads that span structural variations
MIT License
284 stars 41 forks source link

Soft clipping and read splitting #78

Open shimbalama opened 4 years ago

shimbalama commented 4 years ago

HI - I have read through your docs/paper and previous issues but have not found a detailed description of how soft clipping and read splitting work. Could you please point me to the right place or answer here?

From my understanding you split long reads into 256mers and map these then join them together in a way the results in either; (1) the whole long read mapping continuously, (2) ~1half of the long read mapping contiguously in one genomic region and the other half mapping elsewhere or (3) a section of the long read maps and the rest is soft clipped. Is that right or can NGMLR create multiple secondary alignments? I work with cancer genomes and am concerned that I am loosing a lot of read depth to soft clipping due to high levels of genomic rearrangement.

Thanks in advance. Liam

fritzsedlazeck commented 4 years ago

NGMLR can work with multiple splits within one read. We designed it during our work for SKBR3 breast cancer genome.

The split reads are reported as part of the read per entry indicated as soft clipping the part that was mapped elsewhere. Just to make sure that this is not what you are seeing. Other soft clipping can indicate novel insertions, but I assume that this should not happen too often here.

I hope that helps Fritz