Closed palakpsheth closed 1 year ago
Does this print statement need to be conditionalized? For some of these output formats, I was either mimicking or guessing based on output formats I'd seen elsewhere. Meaning, I didn't always fully understand what it needed to be. Any guidance here would be appreciated.
Good point and my apologies for lack of context. I did not consider output formats other than SAM. For the SAM outputs, the SAM spec https://samtools.github.io/hts-specs/SAMv1.pdf doesn't explicitly state that MAPQ should be 0 for unmapped. Picard tools in downstream processing (like CollectAlignmentSummaryMetrics) however expects MAPQ to be 0 for unmapped reads.
A potential workaround for others who encounter this would be to set VALIDATION_STRINGENCY=LENIENT in Picard.
1.4 The alignment section: mandatory fields
In the SAM format, each alignment line typically represents the linear alignment of a segment. Each line consists of 11 or more TAB-separated fields. The first eleven fields are always present and in the order shown below; *if the information represented by any of these fields is unavailable, that field’s value will be a placeholder, either ‘0’ or ‘’ as determined by the field’s type**.
Then:
INVALID_MAPPING_QUALITY | Mapping quality set for unmapped read or is >= 256
I'm not advocating for any change here to parasail
, just hope that this is helpful for others
https://github.com/jeffdaily/parasail/blob/8bcb4f802b18937798149490cab347c17dd5dce3/apps/parasail_aligner.cpp#L2451
I believe at this line, the MAPQ should be 0 (not 255) for unmapped reads