Closed dvklopfenstein closed 1 year ago
Thanks for the detailed bug report. I've fixed the problem but possibly not in a way that you're going to like…
When you run your script, you may notice this message go by:
[E::sam_parse1] CIGAR and query sequence are of different length
The “odd SAM line” with a short CIGAR length but long SEQ is invalid enough that samtools refuses to read it:
$ samtools view bork.sam
…
[E::sam_parse1] CIGAR and query sequence are of different length
[W::sam_read1_sam] Parse error at line 6
samtools view: error reading file "bork.sam"
(I assume you have reported this invalid output as a minimap2 issue.)
Pysam uses the same parsing code, and there was a small glitch in that it ignored the error code and carried on with a partially initialised record instead of propagating the error by raising an exception.
So we just went ahead and fixed the glitch.
Thank you for the fantastic pysam tools. We use this package all the time with great success.
ISSUE
We are currently seeing incorrect
query_qualities
values in a pysamAlignedSegment
in some cases.EXPECTED BEHAVIOR:
When a SAM line contains
query_qualities
='*', indicating that thequery_qualities
are not available, the pysamAlignedSegment
generated from the SAM line correctly sets itsquery_qualities
to None:ACTUAL (ERRONEOUS) BEHAVIOR:
But if minimap2 returns an odd SAM line (CIGAR=28N, when SEQ is ~700 basepairs), pysam's
AlignedSegment
invents an incorrectquery_qualities
array
.When the SAM line
query_qualities
is "*", theAlignedSegment
'squery_qualities
value should be set toNone
, even if minimap2 is generating a questionable aligned segment line.ATTENTION: The
query_qualities
array values in theAlignedSegment
seem randomly generated and change if I comment out one or more of the passing SAM lines (lines 1-3) in the attached self-contained test.Test
Run this small self-contained test to see the error:
Conclusion
When the SAM line
query_qualities
is "*", theAlignedSegment
'squery_qualities
value should be set toNone
, even if minimap2 generates a questionable aligned segment line.Thank you for taking your time to investigate this issue. Thank you for such a well-done tool suite.
cc: @judowill