Open ymcki opened 1 year ago
In SAM, in general a QUAL field of *
indicates that base qualities are not available. Historically no-one has particularly cared about single-base reads and it is unspecified whether in that case *
indicates unavailable or a base quality of 9. That ambiguity is samtools/hts-specs#715, on which you may wish to express an opinion.
However there is no such ambiguity in BAM. If the “ubam file” you are actually feeding to this script is a BAM file, then getting None
here indicates that the record really does have QUAL absent. (However the qs:i:9
tag, if it is indeed an average base quality score, suggests that this data originated from a SAM file that intended QUAL = *
to mean base quality 9…)
Pysam is not crashing here. Your script is crashing when read.query_qualities
returns None
, which your script is not dealing with. This property is None when the QUAL field is absent, and your script probably needs to deal with this possibility.
Are these single-base reads important in your analysis, or are they a handful of degenerate reads that could be filtered out without adverse effect? Are there other single-base reads present in your data with other characters in their QUAL fields?
Thanks for your reply. I don't know about there is an undefined spec regarding this situation. Intuitively, I would think it is a way to specify a single base read with quality 9.
I encountered this situation while using ONT's dorado basecaller. Anyway, I will relay this ambiguity to the dorado basecaller team and see what they want to do with it.
I have a unmapped bam with many entries of basecalls with only one base and a quality of "*".
pysam 0.21.0 then crashes whenever it reaches a line like that.
The simple progam that allows you replicate the problem is:
This is the content of the ubam file presented in sam format I used to reproduce the error
My guess is that pysam treated single "" as this field is empty instead of single base quality of