mhammell-laboratory / TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.
http://hammelllab.labsites.cshl.edu/software/#TEtranscripts
GNU General Public License v3.0
206 stars 29 forks source link

TE subfamily TPM nomalize #176

Closed songlyzz closed 4 months ago

songlyzz commented 5 months ago

Hi olivertam: I am using TEcount to count TE subfamily from mus datas, and I confused if I want to look a single TE subfamily counts how can we nomalize the counts matrix to TPM matrix in subfamily level, can I just nomalize it from the length of TE subfamily or must from all solo length? And how to extract the more than 1000 subfamily length? Thanks very much!

olivertam commented 5 months ago

Hi,

Thank you for your interest in the software.

Unfortunately, it's non-trivial to calculate the TPM for TE. You will need to extract the length of each annotation matching the TE subfamily (indicated by the gene_id in the GTF file) and sum them to get the "length".

We typically don't do this as we usually perform differential analysis using software (e.g. DESeq2) that prefers counts matrix, and then normalize using their approach (median of means in DESeq2's case), and variance-stabilize transform the output for plotting (e.g. in a heatmap).

Thanks.

songlyzz commented 5 months ago

Hi olivertam, Thank you very much !

github-actions[bot] commented 4 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days