Long run time - Githubissues

edinatale commented 8 months ago

Hi,

I am experiencing a very long run time during the t3e.py step. It's been a week and I am still at the 47th sampling step of chromosome 4 (out of 27 chromosomes). Is it something expected?

Cheers, Erica

michelleapaz commented 8 months ago

Hi Erica,

T3E was designed and tested for human and mouse ChIP-seq data. The TE content for these species is not superior to 50%. In principle, T3E can be used for any other species, but some modifications in the code are necessary (e.g., number of chromosomes). In fact, obtaining the enrichment of TE families/subfamilies in a nucleotide level is computationally expensive. If you wish, you can play around with the number of iterations (some users have obtained robust results using just 20 iterations).

To help you use the tool for your purpose, please share the following information:

Which species are you working with?
What is the percentage of TE content in the genome?
How are the TEs distributed?

Thank you for choosing T3E.

Kind regards,

Michelle

edinatale commented 8 months ago

Hi Michelle,

Thank you so much for your answer!

I am working with Ectocarpus siliculosus, it’s a brown algae. The TE content is not high, it’s 30%, and the TEs are homogeneously distributed in the genome. It’s been a week now and I’m at iteration 57 of chromosome 6. I have modified the scripts so that they run also on my species.

Cheers, Erica

On 18. Mar 2024, at 11:58, Michelle Almeida da Paz @.***> wrote:

Hi Erica,

T3E was designed and tested for human and mouse ChIP-seq data. The TE content for these species is not superior to 50%. In principle, T3E can be used for any other species, but some modifications in the code are necessary (e.g., number of chromosomes). In fact, obtaining the enrichment of TE families/subfamilies in a nucleotide level is computationally expensive. If you wish, you can play around with the number of iterations (some users have obtained robust results using just 20 iterations).

To help you use the tool for your purpose, please share the following information:

Which species are you working with? What is the percentage of TE content in the genome? How are the TEs distributed? Thank you for choosing T3E.

Kind regards,

Michelle

— Reply to this email directly, view it on GitHub https://github.com/michelleapaz/T3E/issues/6#issuecomment-2003603042, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZXL6LV2GKGB4JFO6XKE35DYY3CFTAVCNFSM6AAAAABEWUPQWWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBTGYYDGMBUGI. You are receiving this because you authored the thread.

michelleapaz commented 6 months ago

Hi Erica,

I apologise for the delay in replying you.

Thank you for the additional information you provided me.

Just a clarification: run-time (and memory) complexities of T3E depend not only on the TE content of the genome and the initial library size, but also on the number of ambiguously mapping reads and the number of genomic loci to which they map. You should also consider the evolutionary "age" of these TEs.

If you are still facing long run time, I can offer you a closer look to your data.

Please, contact me personally at: michelle.dapaz@embl.de.

All the best,

Michelle

michelleapaz / T3E

Long run time #6