mhammell-laboratory / TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.
http://hammelllab.labsites.cshl.edu/software/#TEtranscripts
GNU General Public License v3.0
206 stars 29 forks source link

OSError, raised in libcalignmentfile.pyx:1876 #181

Closed zhaoj81 closed 4 months ago

zhaoj81 commented 4 months ago

Hi Oliver,

I am running TEtranscripts on some bam files generated using STAR. Below is the full error message I got. I understand it is the error from pysam reading bam files. My bam files look fine from samtools quickcheck and bgzip -t. Do you have any idea what I should do?

docker run -v pwd:pwd -w pwd --rm mhammelllab/tetranscripts TEtranscripts -t BAMs/7037-CH-0052Aligned.sortedByCoord.out.bam -c BAMs/7037-CH-0049Aligned.sortedByCoord.out.bam --GTF Homo_sapiens.GRCh38.109.gtf --TE GRCh38_Ensembl_rmsk_TE.gtf --outdir SKMM2_TAZ_vs_DMSO/ --sortByPos

INFO @ Wed, 07 Feb 2024 16:53:53:

INFO @ Wed, 07 Feb 2024 16:53:53: Processing GTF files ...

INFO @ Wed, 07 Feb 2024 16:53:53: Building gene index .......

100000 GTF lines processed. 200000 GTF lines processed. 300000 GTF lines processed. 400000 GTF lines processed. 500000 GTF lines processed. 600000 GTF lines processed. 700000 GTF lines processed. 800000 GTF lines processed. 900000 GTF lines processed. 1000000 GTF lines processed. 1100000 GTF lines processed. 1200000 GTF lines processed. 1300000 GTF lines processed. 1400000 GTF lines processed. 1500000 GTF lines processed. 1600000 GTF lines processed. INFO @ Wed, 07 Feb 2024 17:06:02: Done building gene index ......

INFO @ Wed, 07 Feb 2024 17:06:11: Building TE index .......

INFO @ Wed, 07 Feb 2024 17:09:56: Done building TE index ......

INFO @ Wed, 07 Feb 2024 17:09:56: Reading sample files ...

[E::idx_find_and_load] Could not retrieve index file for '.1707325796.663928.bam' [E::bgzf_read_block] Failed to read BGZF block data at offset 9444910871 expected 19687 bytes; hread returned -1 [E::bgzf_read] Read block operation failed with error 4 after 0 of 4 bytes Error: truncated file [Exception type: OSError, raised in libcalignmentfile.pyx:1876] OSError: [Errno 61] No data available Exception ignored in: 'pysam.libcalignmentfile.AlignmentFile.dealloc' OSError: [Errno 61] No data available

olivertam commented 4 months ago

Hi,

Thank you for your interest in the software. We have heard reports of this error from others, and it appears to affect only one file out of many that they are processing. There are a couple of things that you could try: 1) You could run the following code which should show if it's anything obvious, such as why it's empty (based on the error message)

$ hd -s 9444910871 7037-CH-0052Aligned.sortedByCoord.out.bam File_Aligned.sortedByCoord.out.bam | less

2) You could try to convert the BAM file to SAM file, and see if you find an empty entry or lines with missing values there

$ samtools view -h 7037-CH-0052Aligned.sortedByCoord.out.bam > 7037-CH-0052Aligned.sortedByCoord.out.sam

If those do not show an obvious source of the error, you could try the following: 1) Run TEtranscripts using SAM files instead of the BAM file

$ docker run -v pwd:pwd -w pwd --rm mhammelllab/tetranscripts TEtranscripts --format SAM -t 7037-CH-0052Aligned.sortedByCoord.out.sam -c 7037-CH-0049Aligned.sortedByCoord.out.sam --GTF Homo_sapiens.GRCh38.109.gtf --TE GRCh38_Ensembl_rmsk_TE.gtf --outdir SKMM2_TAZ_vs_DMSO/ --sortByPos

2) Re-run the STAR alignment on those files, as it has been reported elsewhere that sometimes STAR might throw an error despite finishing the run.

Please let me know if you are still encountering issues, and we can troubleshoot further.

Thanks.

zhaoj81 commented 4 months ago

Thanks for the reply! I re-ran STAR alignment to generate SAM file instead of BAM. Now it is all good.