mhammell-laboratory / TElocal

A package for quantifying transposable elements at a locus level for RNAseq datasets.
GNU General Public License v3.0
24 stars 8 forks source link

TPM from the count matrix ? #43

Open ptranvan opened 6 months ago

ptranvan commented 6 months ago

Hi,

I just got the count matrix from TElocal.

I would like to compute the TPM to perform within-sample comparison but the matrix is a mix of genes (I got ensembl ID, from annotation provided with --GTF) and transcripts from the TEs gtf.

Any advice for doing this ?

Thanks

olivertam commented 6 months ago

Hi,

I assume that for each gene, you use the length of all non-overlapping exonic regions as the gene length, whereas for TE, the length of the TE copy/instance. Let me know if that is unclear.

Thanks.

ptranvan commented 6 months ago

Thanks I will look for this.

It's somewhat surprising that in the GTF annotation, there are only transcripts with one exon (so I can compute the size easily).

Are there no transposable elements/transcripts with multiple exons ?

olivertam commented 6 months ago

Hi,

Since we're using RepeatMasker as our annotation, we typically don't have the splicing information for TE. While there are studies that have shown spliced TE transcripts (e.g. in HERVK for humans), we don't feel that there is a comprehensive source for this information, and thus have not included it in our TE GTF.

Thanks.