mhammell-laboratory / TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.
http://hammelllab.labsites.cshl.edu/software/#TEtranscripts
GNU General Public License v3.0
217 stars 29 forks source link

TEtranscripts "Building TE index ......" Bus error #97

Closed ZeyanZhang closed 3 years ago

ZeyanZhang commented 3 years ago

Hi,

Thanks for developing this nice tool! I tried to use GTF hg38_rmsk_TE.gtf from your website to analyze TE expression of my RNA-seq data. I met the "Bus error" as shown below, the same error would also come up if I used TEcount. Can you please advise what's wrong? Thanks in advance!

INFO @ Tue, 24 Aug 2021 09:55:28: '# ARGUMENTS LIST: '# name = G20_L_ERV '# treatment files = ['BAM-STAR/G20_shL1_rep1.bam', 'BAM-STAR/G20_shL1_rep2.bam', 'BAM-STAR/G20_shL2_rep1.bam', 'BAM-STAR/G20_shL2_rep2.bam', 'BAM-STAR/G20_shL3_rep1.bam', 'BAM-STAR/G20_shL3_rep2.bam'] '# control files = ['BAM-STAR/G20_CTL_rep1.bam', 'BAM-STAR/G20_CTL_rep2.bam'] '# GTF file = /ref/hg38/genes.gencode.v34.gtf '# TE file = hg38_rmsk_TE.gtf '# multi-mapper mode = multi '# stranded = reverse '# differential analysis using DESeq2 '# normalization = DESeq2_default '# FDR cutoff = 5.00e-02 '# fold-change cutoff = 1.00 '# read count cutoff = 10 '# number of iteration = 100 '# Alignments grouped by read ID = False

INFO @ Tue, 24 Aug 2021 09:55:28: Processing GTF files ...

INFO @ Tue, 24 Aug 2021 09:55:28: Building gene index .......

100000 GTF lines processed. 200000 GTF lines processed. 300000 GTF lines processed. 400000 GTF lines processed. 500000 GTF lines processed. 600000 GTF lines processed. 700000 GTF lines processed. 800000 GTF lines processed. 900000 GTF lines processed. 1000000 GTF lines processed. 1100000 GTF lines processed. 1200000 GTF lines processed. 1300000 GTF lines processed. INFO @ Tue, 24 Aug 2021 10:05:33: Done building gene index ......

INFO @ Tue, 24 Aug 2021 10:07:19: Building TE index .......

Bus error

Best, Zeyan

olivertam commented 3 years ago

Hi Zeyan,

Thank you for your interest in the software. What version of TEtranscripts and Python are you using? How much RAM are you providing to TEtranscripts? For hg38 samples, we recommend at least 10Gb, but can go as high as 30Gb if processing many libraries. If insufficient memory is provided, it might be trying to use swap files, which could lead to this error. It might also be a stupid thing to check, but do you have sufficient disk space? While the output files are typically <100Mb, there are some intermediate files generated that are the same size as your BAM files. We would recommend having at least 1.5x of the largest BAM file in disk space when running.

Please let me know if there are additional questions or concerns. Thanks.

ZeyanZhang commented 3 years ago

@olivertam

Hi Oliver,

Thank you so much for your quick reply! I am using TEtoolkit v2.2.1 and Python v3.7.2 and 16G RAM was provided. I re-submit the job with more RAM requested, and the error was fixed. Thanks again! Best, Zeyan