mhammell-laboratory / TElocal

A package for quantifying transposable elements at a locus level for RNAseq datasets.
GNU General Public License v3.0
21 stars 8 forks source link

how to speed up #32

Closed AlisaGU closed 8 months ago

AlisaGU commented 8 months ago

Hi, just like the title, is there a way to speed up TElocal?

I am building the index of TE annotation, so no clear concept about the running time. However, my genome is big and RNA-seq bam is about 40G. The running time is definitely long.

Could you give me some tips in preparation for the counting step?

olivertam commented 8 months ago

Hi,

Thank you for your interest in the software. The running time for TElocal once the TE index is built should be within 3-4 hrs based on your BAM file, and probably requires maximum 40G of RAM (though sometimes it could spike higher). It is the index building that can take days depending on the number of TE annotations. We are trying to speed up this process in the next release (and potentially removing the need to prebuilt indices), but for now, that is definitely the bottleneck.

Thanks.

AlisaGU commented 8 months ago

Wow, so quick a reply!

There are 60108975 TE in the genome. Can I split it into several files to build index and count reads separately?

olivertam commented 8 months ago

Hi,

Unfortunately, we don't recommend splitting the TE index, as it does cause issues in the EM. The counting though would be relatively quick (order of hours), and you can certainly count in parallel with TElocal. Unfortunately, pre-building the TE index will take quite a long time (maybe more than 7 days), but once done, you won't have to deal with it again and can just count.

Thanks.

AlisaGU commented 8 months ago

OK~, thanks for your quick and detailed answer.

AlisaGU commented 8 months ago

Hi, my index program has running for 7 days and 18hour without any result output. Is it normal?

This is my code:

$TElocal_indexer --afile DN.denovo.RepeatMasker.Telocal.gtf --itype TE
olivertam commented 8 months ago

Hi,

Unfortunately, depending on how big your TE GTF file is, the indexing step takes quite a while. I'm afraid it could take more days still. We are currently developing a version of TElocal that could bypass/speed up this step.

Thanks.

AlisaGU commented 8 months ago

Is it expected that no result was outputted during the process?

olivertam commented 8 months ago

Unfortunately, yes, because everything is being processed in memory. We had considered improving the logging, but this was technically a script that we were using in-house, and with our efforts going towards removing the index requirement, we decided not to add more to the indexing script.

Thanks.

AlisaGU commented 8 months ago

Thanks for your continuous efforts! Looking forward to using the improved version!