mhammell-laboratory / TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.
http://hammelllab.labsites.cshl.edu/software/#TEtranscripts
GNU General Public License v3.0
206 stars 29 forks source link

How to consider the relationship between genes and TE when counting reads #122

Closed ZunpengLiu closed 1 year ago

ZunpengLiu commented 1 year ago

Hello Sir,

Really enjoyed using these robust TEtools to calculate the expression of TEs! All of these tools are very helpful and user-friendly

I have a question regarding the count method of genes and TEs. TEtranscripts and TElocal can count the genes and TEs--that's so cool. I am interested in the count method.

  1. (a), Did you consider the genes and TEs, and count reads of genes and TEs at the same time? (b), Or, The relationship between genes and repetitive elements is not considered, first count genes and then count repetitive elements. That is to say, is the count of genes and TEs considered independently or simultaneously? Some TEs may locate in the intron regions of genes.
  2. Did you implement/introduce other software in this counting process?

For example, what does TEtranscripts use here? If I got right from your paper, genes and TEs were considered simultaneously.

image

Best,

Zunpeng

olivertam commented 1 year ago

Hi,

Thank you for your interest in the software. In brief, we determine whether a read is a TE or gene based on whether their alignment overlaps with a gene exon or TE. If they overlap both a gene exon and TE, then we determine if the read aligned uniquely to the genome, or to multiple locations. If it is uniquely aligned, then we will assign the read to the gene (if a gene annotation exists) or to the TE (if no gene annotation exists for that alignment). If the read is "ambiguously mapped" (while running with --mode multi), then if there are TE annotations at those sites, the read will be equally weighted across those annotations, which will then be processed via EM. If the read is ambiguously mapped but only has gene exonic annotations, then the read will be equally weighted across the gene annotations. Since we are working with bulk RNA-seq, we do not consider intronic regions of genes. Please let us know if that does not address your question.

Thanks.

ZunpengLiu commented 1 year ago

Hi,

Thank you so much for the prompt reply.

That is a very detailed explanation, thanks a lot!

Best,

Zunpeng

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days