Pysam is a Python package for reading, manipulating, and writing genomics data such as SAM/BAM/CRAM and VCF/BCF files. It's a lightweight wrapper of the HTSlib API, the same one that powers samtools, bcftools, and tabix.
I possess numerous sorted BAM files; however, for my project, I am required to randomly select a subset of reads (1e5) from them. I have explored the option of converting a pysam object to a list, but encountered issues with substantial memory usage and slow processing. Similarly, the downsampling APIs of samtools and picard present similar challenges. Is there any efficiency may?
I possess numerous sorted
BAM
files; however, for my project, I am required to randomly select a subset of reads (1e5) from them. I have explored the option of converting apysam
object to a list, but encountered issues with substantial memory usage and slow processing. Similarly, the downsampling APIs ofsamtools
andpicard
present similar challenges. Is there any efficiency may?