lh3 / minimap2

A versatile pairwise aligner for genomic and spliced nucleotide sequences
https://lh3.github.io/minimap2
Other
1.81k stars 414 forks source link

1-4 nt shit in exome boundary in aligned bam from isoseq subread data #1261

Closed wzhang42 closed 1 hour ago

wzhang42 commented 2 hours ago

Hi, Heng, I have some isoseq subread.bam data. I followed the isoseq workflow and did the ccs calling, isoseq lima , isoseq refine and generated the flnc.bam. I use "minimap2 -ax splice:hq -uf --secondary=no -C5" to do the mapping both for the subread.bam and flnc.bam. The aligned bam from flnc.bam look ok, but we found there are 1-4 nt shift at the exome boundary in the aligned bam from subread.bam. Not sure whether this is normal or a bug. Use the aligned bam from flnc.bam look ok, but the coverage is only 10-20% of the subread case.
Screenshot (3878)

lh3 commented 1 hour ago

Subreads are low-quality, so you should use -x splice instead. They will be harder to align correctly anyway due to more errors.

In addition, as I remember – I could be wrong on this – subreads are duplicated. You probably don't want to use them in the first place.