philres / ngmlr

NGMLR is a long-read mapper designed to align PacBio or Oxford Nanopore (standard and ultra-long) to a reference genome with a focus on reads that span structural variations
MIT License
293 stars 40 forks source link

Fix for "Signed Integer Overflow in MAPQ" #96

Closed monsanto-pinheiro closed 3 years ago

monsanto-pinheiro commented 3 years ago

When some read doesn't map at all the MAPQ is a "Signed Integer". SAM example of a aligned read that does not map: "597843e1-50df-4887-ab2c-c78a4d9ac28d 2048 chrXV 638441 -2147483648 4870S25M2I7M1I3M1D14M1D7M1I19M101S ..." The MAPQ in this case is -2147483648. This creates some problems for samtools. "[W::sam_read1_sam] Parse error at line YYYY" This happens because "mqCount" is zero on return of method "AlignmentBuffer::computeMappingQuality"

smoe commented 3 years ago

I just ran into this. A new release tag would be nice! Have many thanks for the fix - much appreciated.

jmarshall commented 3 years ago

:+1: from bioconda as well. it would be ideal if there would be a new ngmlr release with this fixed, so that we could provide a ngmlr package that interacts well with current samtools.

Note that mqCount == 0 in computeMappingQuality() does indeed appear to be the cause of this problem (I have inspected the code, but not run any tests), but the appropriate value to return for MAPQ in this case deserves a bit of thought.

What situation does mqCount == 0 represent? Does this call for a low value for MAPQ (e.g. 0), a high value, or perhaps 255 — which the SAM spec describes as “mapping quality is not available”:

0 <= MAPQ <= 255: MAPping Quality. It equals −10 log10 Pr{mapping position is wrong}, rounded to the nearest integer. A value 255 indicates that the mapping quality is not available.

smoe commented 3 years ago

With local Nanopore genomic sequencing data I ran into this bug every ~5000 reads and all reads have passed the guppy QA. @jmarshall, you do not happen to know someone with a personal contact to upstream to help escalating this bug a bit?