Open lowandrew opened 6 years ago
I see this as well, i think it can lead to ambiguity when filtering reads by their base quality during variant calling if the qualities are getting summed.
I'm also seeing this behaviour. Has anyone resolved this without ditching the pileup class?
Running into the same issue. It seems to be a bug because samfile.pileup()
has an argument ignore_overlaps
with the following description:
ignore_overlaps: bool
If set to True, detect if read pairs overlap and only take
the higher quality base. This is the default.
So it should be taking the higher quality base, but it's not and instead summing them up.
Dug into the issue a bit more and turns out that this behaviour is associated with mpileup:
This post was also helpful (https://github.com/samtools/samtools/issues/1146#issuecomment-559756496).
My understanding is that the sum of the base qualities is the expected behaviour when the overlapping reads have the same base at that position. You can turn off the handling of overlapping reads by using setting ignore_overlaps=False
, which is equivalent to --ignore-overlaps
in mpileup, and then handle it the reads independently.
When getting query base qualities out of a pileup, I'm often getting numbers that are way higher than I would expect qualities to be (>70).
As far as I can tell, this happens when paired end reads overlap - in the overlap regions, quality looks like it's getting reported as the sum of the forward read and reverse read quality, whereas I would have expected to get the two qualities out individually.
Not entirely sure if this is intended behaviour or a bug.
Code to get base qualities out is below, this occurs using samtools 1.6 and pysam 0.14.1.
Thanks!