pysam-developers / pysam

Pysam is a Python package for reading, manipulating, and writing genomics data such as SAM/BAM/CRAM and VCF/BCF files. It's a lightweight wrapper of the HTSlib API, the same one that powers samtools, bcftools, and tabix.
https://pysam.readthedocs.io/en/latest/
MIT License
773 stars 274 forks source link

Why is the site coverage of pileup different from that observed by igv? #1254

Open haogecat opened 8 months ago

haogecat commented 8 months ago

I used pysam to output the SNPS at locus 22218574 of the reference genome, but the output reads were fewer than what I visualized with IGV. The results of pysam were many fewer reads. Why? Here is the code:

import pysam

bf = pysam.AlignmentFile("sled.bam", "rb" )
for pileupcolumn in bf.pileup():
    if pileupcolumn.reference_pos+1==22218574:
        for pileup in pileupcolumn.pileups:
            print ("site:%d\tbase in read %s = %s" % (pileupcolumn.pos+1,pileup.alignment.query_name, pileup.alignment.query_sequence[pileup.query_position]))
bf.close()

Below is a visualization of igv at site 22218574。 image

Sincerely request you, thanks!

jmarshall commented 8 months ago

You don't say how many reads were indicated by either pysam or IGV.

Pysam's pileup() uses the same machinery as samtools. (So you may also wish to experiment with samtools and its similar parameters.) Usually lower depth than expected is caused by things like max_depth and min_base_quality, so you should start by adjusting those arguments.

haogecat commented 8 months ago

You don't say how many reads were indicated by either pysam or IGV.

Pysam's pileup() uses the same machinery as samtools. (So you may also wish to experiment with samtools and its similar parameters.) Usually lower depth than expected is caused by things like max_depth and min_base_quality, so you should start by adjusting those arguments.

Thank you very much for your answer, I will try