pysam-developers / pysam

Pysam is a Python package for reading, manipulating, and writing genomics data such as SAM/BAM/CRAM and VCF/BCF files. It's a lightweight wrapper of the HTSlib API, the same one that powers samtools, bcftools, and tabix.
https://pysam.readthedocs.io/en/latest/
MIT License
773 stars 274 forks source link

pileup() in 0.22 is slower than 0.16.0.1 #1256

Open Crispy13 opened 8 months ago

Crispy13 commented 8 months ago

test code:

import sys
import timeit

from tqdm.auto import tqdm
import pysam

def main():
    samfile = pysam.AlignmentFile(
        "my.bam",
        "rb"
    )

    qns = []
    qa = qns.append
    # pbar = tqdm()
    for pileupcolumn in samfile.pileup("chr1", 11108234-1, 11108234, truncate=True):
        for p in pileupcolumn.pileups:
            qn = p.alignment.query_name
            qa(qn)

            # pbar.update(1)

if __name__ == '__main__':
    print(f"{sys.version=}")
    print(f"{pysam.__version__=}")
    print(timeit.repeat(main, number = 100, repeat=5))

Results:

sys.version='3.9.9 | packaged by conda-forge | (main, Dec 20 2021, 02:41:03) \n[GCC 9.4.0]'
pysam.__version__='0.16.0.1'
[3.182755395071581, 3.193303174106404, 3.189940463984385, 3.197926699882373, 3.1971522108651698]

sys.version='3.9.9 | packaged by conda-forge | (main, Dec 20 2021, 02:41:03) \n[GCC 9.4.0]'
pysam.__version__='0.22.0'
[3.9120245100930333, 3.912719252984971, 3.9148598960600793, 3.9105764548294246, 3.9074977058917284]

sys.version='3.11.7 | packaged by conda-forge | (main, Dec 15 2023, 08:38:37) [GCC 12.3.0]'
pysam.__version__='0.22.0'
[3.736623673932627, 3.7335626899730414, 3.734525352017954, 3.734872733009979, 3.7376032220199704]

Is it normal? or did I miss something?