mhammell-laboratory / TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.
http://hammelllab.labsites.cshl.edu/software/#TEtranscripts
GNU General Public License v3.0
206 stars 29 forks source link

Question on output table #164

Closed Drosofriends closed 5 months ago

Drosofriends commented 7 months ago

Hi,i'm using TEtranscripts for my analysis. I provide the gtf of the genome(downloaded by ENSMBL Dmel), the GTF of TEs downloaded on your site and the bam files using --mode multi. After this the pipeline runs well and I obtained the following table (head of table): gene_id | baseMean | log2FoldChange | lfcSE | stat | pvalue | padj FBti0063061 | 46.9274925275789 | -3.43399773245029 | 0.818956460808408 | -4.19313833736719 | 2.75121549811408e-05 | 0.00113025344889736 FBti0064012 | 53.1908745364404 | 2.01977262565284 | 0.590014919434608 | 3.42325686880651 | 0.000618755765331622 | 0.0154545044643294 FBti0018964 | 175.208355098628 | 2.02280369770908 | 0.496859656153632 | 4.07117718787701 | 4.67761513283243e-05 | 0.00178963412563024 FBti0060325 | 224.902126028936 | 2.02874715933505 | 0.564950581254248 | 3.59101703166855 | 0.000329390107792762 | 0.00934473520900372 FBti0018963 | 1594.86324798786 | 2.06678161223286 | 0.173301704726221 | 11.9259162251054 | 8.67245974955562e-33 | 4.65711088551137e-30 FBti0063499 | 51.5607134539747 | 2.14660522255756 | 0.628081327430511 | 3.41771858007522 | 0.0006314836473823 | 0.0156941952430418

I expected to obtain the quantification of the whole transposon families such as COPIA,BLOOD etc. Why did I obtained the quantification of every single sequence belonging to the family? Do I have to sum the log2FC for each transcripts (FBti)? There are any options to obtain the quantification and DE of the whole family?

olivertam commented 7 months ago

Hi,

Those annotations are coming from your Ensembl "gene" annotation file, which unfortunately includes TE insertion annotations (as indicated by FBti). Our TE annotations typically come in the form of subfamily:family:class, such as Copia_I-int:Copia:LTR. We recommend removing all the annotations with FBTi from your Ensembl annotation file, as they are probably reducing the accuracy of the TE annotation by diverting reads away from the TE GTF annotations.

Thanks.

Drosofriends commented 6 months ago

Thanks for your help, I solved the problem with the fbti insertion and now it works. But now i need some other help, I started a RT-qPCR validation on some of upregulated TE and I figured out that for example for Copia I have single quantification for multiple copia sequences. If my primers align with the consensus sequence of all Copia sequences do I have to sum the log2FC or do I have to do the mean of log2FC? For example: ens_gene baseMean log2FoldChange lfcSE stat pvalue padj Copia_I-int:Copia:LTR 703981,4062 3,949987798 0,24201238 16,32142868 6,94901E-60 1,7959E-56 Copia_LTR:Copia:LTR 18197,0739 3,520660155 0,403820586 8,718377116 2,82218E-18 1,013E-15 Copia1-I_DM:Copia:LTR 7206,41332 0,986315678 0,298280324 3,30667362 0,000944108 0,023827669 Copia1-LTR_DM:Copia:LTR 263,2205707 2,413198575 0,763612151 3,160241192 0,001576386 0,035426188 Copia2_I-int:Copia:LTR 5685,225281 1,063833742 0,169365201 6,28130063 3,35752E-10 4,87482E-08

So in this case Copia has 10 log2FC or 2,3? Moreover each Copia entries represent a distinct subfamily ? Thanks for your support

olivertam commented 6 months ago

Hi,

Thank you for your question. It might be non-trivial to calculate the log2FC to match your QPCR results, as you would need to determine whether your primers match all Copia subfamily (Copia I internal and/or Copia LTR), or whether it also matches the Copia1 and Copia2 subfamilies (i.e. it matches the Copia family of TE). If it's the former, then you would want to use the Log2FC of the specific subfamily. If it's the latter, then you would want to aggregate the raw counts of all the relevant subfamilies in the count table, and then re-run differential expression to get your log2FC for that aggregated group of TE. Please let me know if that does not address your question.

Thanks.

Drosofriends commented 6 months ago

Thank you so much I will perform the analysis considering your advices. So considering that I excluded FBti insertions from the input GTF do I have to also remove any 'RR_transposable_element' (i.e. RR48398_transposable_element), ensembl gtf also considers this annotation and I find quantification for this sequences. Sorry for all these questions but it's the first time for me on TEtranscripts.

olivertam commented 6 months ago

Hi,

No worries. Unfortunately, we have no control over the "gene" annotation provided by other groups, so there will be cases like these. All the best, and let us know if you have more questions.

Thanks.

github-actions[bot] commented 5 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days