mhammell-laboratory / TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.
http://hammelllab.labsites.cshl.edu/software/#TEtranscripts
GNU General Public License v3.0
206 stars 29 forks source link

How long to build TE index from hg19_rmsk_TE.gtf? #15

Closed dtsgx closed 6 years ago

dtsgx commented 6 years ago

INFO @ Fri, 17 Nov 2017 13:52:22: Processing GTF files ...

INFO @ Fri, 17 Nov 2017 13:52:22: Building gene index .......

100000 GTF lines processed. 200000 GTF lines processed. 300000 GTF lines processed. 400000 GTF lines processed. INFO @ Fri, 17 Nov 2017 13:53:16: Done building gene index ......

INFO @ Fri, 17 Nov 2017 13:54:15: Building TE index .......

It seems to be stuck in this step. Is this normal?

@olivertam @yingjin07

yingjin07 commented 6 years ago

It usually takes about 10 minutes.

dtsgx commented 6 years ago

Thank you. But it had been building TE index for over 12h and still not done. What't the problem?

olivertam commented 6 years ago

Hi. You are right. It should not be taking this long. Was the TE GTF file obtained from our FTP site? How much memory do you have on the machine where you are running TEtranscripts? Any other information (such as the command line arguments used etc) would be most helpful. Thanks

dtsgx commented 6 years ago

TE GTF files were downloaded from your FTP. TEtranscripts was ran on a 4 GB RAM virtual machine. Is it too small to run TEtranscripts? Thanks.

olivertam commented 6 years ago

Hi. Given the long time that the TE index is taking, I am suspecting that 4 GB of RAM on the virtual machine (which is probably running something else) might be insufficient. We have typically used 15Gb of RAM, and typically prefer at least 9Gb dedicated to the task (my recollection is that the tool could use up to 8.4 Gb for human data, largely due to the TE index). Hope this is helpful. Thanks.

yingjin07 commented 6 years ago

Hi,

Yes. 4GB of memory is not sufficient to keep both gene and TE indices and intermediate data structures for storing multi-reads. Please refer to the paper Figure 6 for running time and memory usage of different sample sizes.

Best, Ying

Ying Jin PhD

Computational Science Manager Bioinformatics Shared Resources Cold Spring Harbor Lab 516-367-5190 yjin at cshl dot edu


From: Oliver Tam [notifications@github.com] Sent: Sunday, November 19, 2017 1:42 PM To: mhammell-laboratory/tetoolkit Cc: Jin, Ying; Mention Subject: Re: [mhammell-laboratory/tetoolkit] How long to build TE index from hg19_rmsk_TE.gtf? (#15)

Hi. Given the long time that the TE index is taking, I am suspecting that 4 GB of RAM on the virtual machine (which is probably running something else) might be insufficient. We have typically used 15Gb of RAM, and typically prefer at least 9Gb dedicated to the task (my recollection is that the tool could use up to 8.4 Gb for human data, largely due to the TE index). Hope this is helpful. Thanks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_mhammell-2Dlaboratory_tetoolkit_issues_15-23issuecomment-2D345539387&d=DwMFaQ&c=mkpgQs82XaCKIwNV8b32dmVOmERqJe4bBOtF0CetP9Y&r=4KRE0zvN4-P2SAzpXyOzmw&m=Hr1oRU19QHSjNhQnAezEmeEoSikL2Sj91M_obBDmkLs&s=tIN3GcVijKjlTS3Z7nYi1uVw9utR1Cx3n7uLyam32aQ&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AOou-2DPHfqFyvIv296TlcUGID-2DiuCXuwRks5s4HZ7gaJpZM4QhluV&d=DwMFaQ&c=mkpgQs82XaCKIwNV8b32dmVOmERqJe4bBOtF0CetP9Y&r=4KRE0zvN4-P2SAzpXyOzmw&m=Hr1oRU19QHSjNhQnAezEmeEoSikL2Sj91M_obBDmkLs&s=mpqGCWluAzD9sP_Fu6MDwSIAVRaEEhG9lkyekHPDBxY&e=.