mhammell-laboratory / TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.
http://hammelllab.labsites.cshl.edu/software/#TEtranscripts
GNU General Public License v3.0
206 stars 29 forks source link

Intergenic TEs #106

Closed tiplud closed 2 years ago

tiplud commented 2 years ago

Hi! Thank you for this excellent package. I am using TEcount to quantify gene and TE expression in mouse samples. I obtained the TE gtf file (GRCm38_GENCODE_rmsk_TE.gtf) from http://hammelllab.labsites.cshl.edu/software/#TEtranscripts My understanding is that the TEs are intergenic or intronic ? If so, is there a way to only consider intergenic TEs ?

Thank you very much, Debayan

olivertam commented 2 years ago

Hi Debayan,

Thank you for your interest in the software. Technically, TE can be anywhere in the genome (so exonic, intronic and intergenic). If you wish to only consider intergenic, you can always filter the TE GTF file using intersectBed (from bedtools suite) using the something similar to the following:

$ intersectBed -a GRCm38_GENCODE_rmsk_TE.gtf -b GENCODE_transcripts.gtf -v > intergenic_TE.gtf

The code above is taking the TE GTF, intersecting with the transcript co-ordinates of the GENCODE annotation (note that you will want to provide the transcript features, and not just the exonic features, if you want to remove intronic), and take anything that has zero overlap (not even 1 base pair). If you want to be less stringent, you can allow overlap as a fraction of the TE annotation using -f [fraction], where [fraction] could be between 0.01 to 1 (i.e. 1% to 100% of the annotation in file a). Please feel free to look at the intersectBed page for more information.

Hope this is helpful.

Thanks

tiplud commented 2 years ago

Hi Oliver, Thank you so much for the prompt reply! So, just to verify, I should remove repeatmasker TE coordinates which fall in exon/intronic locations, and then run the quantification again with the reduced TE gtf, right?

Thanks again, Debayan

olivertam commented 2 years ago

Hi Debayan,

If you are only interested in intergenic TE (and want to ignore reads unambiguously from exonic and intronic TE), then your approach is correct (replacing the original TE GTF with the reduced TE GTF). Please note that this approach does not eliminate the possibility of an exonic/intronic TE read being assigned to an intergenic one if the read can still aligns to the intergenic TE.

Thanks.

tiplud commented 2 years ago

Hi Oliver, Thanks a lot for your quick explanations! Best, Debayan