Closed iriirica closed 3 weeks ago
Hi,
Thank you for your interest in the software.
It is unclear from your TE GTF whether you have a unique ID for each gene_id
. If so, that significantly slows down the TE index building (up to days). Could you confirm that you do not have unique gene_id
values for each entry?
If you prefer to use this GTF as-is, we would recommend TElocal
and pre-building an index using this script might be better.
Please note that it would still take many days to build that index.
Thanks.
To distinguish TEs in different genomic locus, I have renamed all the TE entries before generating the TE GTF file, so gene_id
is unique. And I'll try to run TElocal_indexer seperately, thanks for your suggestion!
However, I still find myself a bit confused. Isn't it supposed to be unique for each TE entry? If there are duplicate gene_ids
for TE, how to confirm which is expressed in the results?
Hi,
The concept behind TEtranscripts
is that we're measuring TE at the sub-family level, i.e. share the same consensus sequence in repeat libraries such as Repbase and Dfam. Thus, the gene_id
corresponds to those sub-family/consensus. We find that this enables quantification of TE at a level that is still biologically meaningful (as most studies are assessing TE expression based on consensus mapping or qPCR with degenerate primers), and allows differential analysis with sufficient counts.
Thanks.
I think I know how to analysis my TEs. Thanks again for your help!
Hello, Thank you for the great tool! I am trying to run the TEcounts, but I have encountered an issue.
Here is my log file, and the
Building TE index
step has ran for 6 days.Here is the command I used:
The $TEgtf file was generated using the makeTEgtf.pl script you provided, and its size is 3.5 GB. The genome size I am working with is 6.2 GB.
Could you please help me understand what might be causing this extended runtime? Thank you for your assistance!