jeffdaily / parasail

Pairwise Sequence Alignment Library
Other
243 stars 34 forks source link

MAPQ should be 0 for unmapped reads #100

Closed palakpsheth closed 1 year ago

palakpsheth commented 1 year ago

https://github.com/jeffdaily/parasail/blob/8bcb4f802b18937798149490cab347c17dd5dce3/apps/parasail_aligner.cpp#L2451

I believe at this line, the MAPQ should be 0 (not 255) for unmapped reads

jeffdaily commented 1 year ago

Does this print statement need to be conditionalized? For some of these output formats, I was either mimicking or guessing based on output formats I'd seen elsewhere. Meaning, I didn't always fully understand what it needed to be. Any guidance here would be appreciated.

palakpsheth commented 1 year ago

Good point and my apologies for lack of context. I did not consider output formats other than SAM. For the SAM outputs, the SAM spec https://samtools.github.io/hts-specs/SAMv1.pdf doesn't explicitly state that MAPQ should be 0 for unmapped. Picard tools in downstream processing (like CollectAlignmentSummaryMetrics) however expects MAPQ to be 0 for unmapped reads.

A potential workaround for others who encounter this would be to set VALIDATION_STRINGENCY=LENIENT in Picard.

jeffdaily commented 1 year ago

1.4 The alignment section: mandatory fields

In the SAM format, each alignment line typically represents the linear alignment of a segment. Each line consists of 11 or more TAB-separated fields. The first eleven fields are always present and in the order shown below; *if the information represented by any of these fields is unavailable, that field’s value will be a placeholder, either ‘0’ or ‘’ as determined by the field’s type**.

Then:

  1. MAPQ: MAPping Quality. It equals −10 log10 Pr{mapping position is wrong}, rounded to the nearest integer. A value 255 indicates that the mapping quality is not available.
palakpsheth commented 1 year ago

Picard tools: https://gatk.broadinstitute.org/hc/en-us/articles/360035891231-Errors-in-SAM-or-BAM-files-can-be-diagnosed-with-ValidateSamFile

INVALID_MAPPING_QUALITY | Mapping quality set for unmapped read or is >= 256

I'm not advocating for any change here to parasail, just hope that this is helpful for others