pysam-developers / pysam

Pysam is a Python package for reading, manipulating, and writing genomics data such as SAM/BAM/CRAM and VCF/BCF files. It's a lightweight wrapper of the HTSlib API, the same one that powers samtools, bcftools, and tabix.
https://pysam.readthedocs.io/en/latest/
MIT License
786 stars 273 forks source link

Truncated File Error using Fetch #897

Closed jaredjurss closed 4 years ago

jaredjurss commented 4 years ago

Hello, I am just starting with psyam. I get an error running this code:

import pysam samfile = pysam.AlignmentFile("scenedesmus_sorted.bam", "rb", index_filename="scenedesmus_sorted.sam.gz.tbi") reads = [] for read in samfile.fetch("unitig_855"): reads.append(read) print(len(reads))

The error is: [E::bgzf_read] Read block operation failed with error 2 after 0 of 4 bytes Traceback (most recent call last): File "/Users/jjurss/PycharmProjects/Bam_filter/test.py", line 7, in for read in samfile.fetch("unitig_855"): File "pysam/libcalignmentfile.pyx", line 2092, in pysam.libcalignmentfile.IteratorRowRegion.next OSError: truncated file

I made my sam file using BWA and sorted it with samtools. I then used tabix to make the index file. I also used samtools to make the bam file.
I have checked the sam and bam file using samtools quickcheck, gatk ValidateSamFile and also pysam's check_truncation() function. All of which say the files have no issues.

Thanks, Jared

kevinjacobs-progenity commented 4 years ago

The problem is that you are passing the tabix index of your SAM file to index your BAM file. i.e. you're using the wrong index. You need to index your BAM file and if you do it conventionally, there will be no need to manually pass in the index file.

jaredjurss commented 4 years ago

Well that was an easy fix. I feel a little dumb now haha Thanks for the help