mhammell-laboratory / TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.
http://hammelllab.labsites.cshl.edu/software/#TEtranscripts
GNU General Public License v3.0
206 stars 29 forks source link

System requirements for TEtranscripts (locus level) #105

Closed olivertam closed 2 years ago

olivertam commented 2 years ago

Repost from #33:

Dear Oliver Tam, Greetings! What is the minimum system requirement to run TEtranscripts? I use WSL2 (windows subsystem for Linux), and my computer has 128 GB RAM. When I consider repeat name as gene, it took 3 hours to run 2 BAM files each with 10 GB size. However, while treating each copy of the TE as a distinct “gene”, TEtranscripts is taking long time, now 24 hours over, still it is running, I do not know how long it will take. Could you give me some suggestion?

image

With regards

Ramky

olivertam commented 2 years ago

Hi Ramky,

The system requirement for TEtranscripts is at least 32 Gb RAM for the human genome (more for larger genome). You computer should have sufficient capacity for most runs.

The big slow down is due to the need to treat each TE copy as a distinct "gene". This greatly expands the number of annotations from ~900 in humans to >460k. As a result, the algorithm takes much longer to build the TE index (which is what you are observing right now), but it should run fine.

If you are really interested in doing more locus-specific analyses, we would recommend TElocal. You will need a pre-built index for this software, which we can help you make, or you can make using the TElocal_indexer script (requires TElocal to be installed).

Hope this is helpful.

Thanks.

Ramkyeri commented 2 years ago

Dear Oliver Tam,

Thank you very much for your guidance. I am also interested in using TElocal. I will try this.

I agree with you, we need more RAM if treat each TE copy as a distinct "gene.

The program is also completed, It took more 36 hours.

with regards

Ramky