mhammell-laboratory / TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.
http://hammelllab.labsites.cshl.edu/software/#TEtranscripts
GNU General Public License v3.0
217 stars 29 forks source link

Majority of TE reads coming from Chomosome 1 with human data #79

Closed rowancallahan closed 3 years ago

rowancallahan commented 3 years ago

Hello!

We have been using the special GTF files that are supplied on the Hammell website for Transposable Elements, and have been looking at reads that were aligned with STAR. However, all of our reads seem to come from Chromosome 1. Originally we thought that this might have been an experimental effect. however after running your program on data simulated using Flux Simulator we found the same issue. Almost all of the TEs still seem to originate from chromosome 1.

Is it possible there is something we are missing, or is this already a known fact about TE's in the human genome? I have attached an example of our simulated data results to the bottom of this thread.

Any help is much appreciated!

N_TEs_detected_per_Type_byChr.pdf

olivertam commented 3 years ago

Hi,

How are you determining which chromosome the TE is originating from? Are the reads aligning only to chromosome 1 in your BAM file? The name that is given in the TEtranscripts count table (e.g. L1HS:L1:LINE) is not for a particular instance, but an aggregation of all genomic instances that are annotated as L1HS. I would treat it as matching the gene_id, and not the transcript_id in the GTF file. If you are interested in quantifying TE expression at a locus/instance level, we would recommend trying TElocal. Please let me know if anything is unclear, or if I didn't address the issue. You can send me the output from TEtranscripts, or a workflow on how you determined the chromosomal origins of the TE, and I will try to troubleshoot.

Thanks.

rowancallahan commented 3 years ago

Hi Oliver,

Thank you so much for your help with this! I was unaware of TElocal also and will make sure to use this when localizing transcripts in the future. This also helped clear up some of my understanding of how TEtranscripts works. After going through our code it looks like the issue was on our end and not something that was an issue with TEtranscripts. Again, thanks so much for your help!

Best,

Rowan