mhammell-laboratory / TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.
http://hammelllab.labsites.cshl.edu/software/#TEtranscripts
GNU General Public License v3.0
206 stars 29 forks source link

Memory requirements + threading #54

Closed aleighbrown closed 4 years ago

aleighbrown commented 4 years ago

Hi, this is more of a suggestion than an issue per se.

Trying to push running TEtranscripts off onto my cluster now to run for real and had to search through the issues to find a) no multithreading yet and b) memory requirements "We have typically used 15Gb of RAM, and typically prefer at least 9Gb dedicated to the task (my recollection is that the tool could use up to 8.4 Gb for human data, largely due to the TE index)."

Anyway, might be good to put those 2 facts in your manual, at least memory requirements for those of us working on clusters

Cheers and thanks! AL

olivertam commented 4 years ago

Hi,

Thank you for the suggestions.

Regarding multi-threading, we typically use TEcount to quantify each sample (and thus send a single library to each node), and then merge the outputs into a single count table before running DESeq2 aftewards. This might be a useful approach if you have lots of replicates to quantify for a single differential analysis.

Although we mentioned the memory requirements in the associated publication, we agree that it might require some updates (as libraries are getting bigger), and should also be in the README for ease of access.

Thanks again, and all the best with your analysis.