pysam-developers / pysam

Pysam is a Python package for reading, manipulating, and writing genomics data such as SAM/BAM/CRAM and VCF/BCF files. It's a lightweight wrapper of the HTSlib API, the same one that powers samtools, bcftools, and tabix.
https://pysam.readthedocs.io/en/latest/
MIT License
782 stars 274 forks source link

get_query_sequences function from pileupcolumn class fails to display in samtools mpileup format #1163

Open lstbl opened 1 year ago

lstbl commented 1 year ago

If you are using the "get_query_sequences" function from the "pileupcolumn" class, you cannot display the output in the Samtools mpileup format if you select all boolean options to be true.

I think this is an issue with the following part of the function call (line 3091) if mark_matches and self.reference_sequence != NULL:

From what I can tell, the pileupcolumn class does not have the self.reference_sequence as a defined property.

a small snippet of code that shows this error is the following:

samfile = ps.AlignmentFile("path/to/bamfile.bam", "rb")
for pileupcolumn in samfile.pileup(var_test.ref_name, 10, 11, truncate = True):
    print(pileupcolumn.get_query_sequences(add_indels = True, mark_matches=True))
    print(pileupcolumn.get_query_sequences(add_indels = True, mark_matches=False))

The final two lines will produce the exact same output.

yifanhu1 commented 1 year ago

"get_query_sequences" only works if you use stepper="samtools".