mhammell-laboratory / TElocal

A package for quantifying transposable elements at a locus level for RNAseq datasets.
GNU General Public License v3.0
21 stars 8 forks source link

Generate the locInd annotation from GTF #24

Closed ZunpengLiu closed 1 year ago

ZunpengLiu commented 1 year ago

Hello,

Thank you very much for developing so cool tools to estimate the repetitive element expression at locus-specific level.

I am interested in some genomic regions which are not annotated by RepeatMasker. And, I am trying to calculate the expression levels for these genomic loci. However, I only have the GTF files and the TElocal may need the locInd file [--TE TE-annot-file locInd file for transposable element annotations]. Could you please help us generate these files or tell us the strategy how to generate these files?

The GTF files: GTF.zip

Many thanks for your help!!!

Best,

Zunpeng

olivertam commented 1 year ago

Hi Zunpeng,

Thank you for your interest in the software. Do you want to create an index that has both the RepeatMasker and your own annotations? This is probably preferable, as you would want to quantify your custom annotations alongside the other TE. If so, I would recommend the following steps:

## You want to exclude overlapping exonic annotations from the RepeatMasker GTF
$ awk '$3=="exon"' hg19_HERV.gtf > hg19_HERV_exons.gtf
## Removing overlapping exons
$ bedtools intersect -s -v -a hg19_rmsk_TE.gtf -b hg19_HERV_exons.gtf > hg19_rmsk_filtered.gtf
## Edit your custom GTF to include tags for family_id and class_id
$ sed -i 's/repClass/class_id/;s/repFamily/family_id/' hg19_HERV_exons.gtf
## Make your "final" GTF
$ cat hg19_rmsk_filtered.gtf hg19_HERV_exons.gtf | sort -k1,1 -k4,5n > hg19_custom_repeat.gtf
## Run our TElocal indexer (assuming that you have already installed TElocal somewhere on the system). 
## Please note that this can take days to run.
$ ./TElocal_indexer --afile hg19_custom_repeat.gtf --itype TE

You can get bedtools here, and our indexer here. Our RepeatMasker GTF files are here: (hg19 and hg38).

Let me know if you encounter any issues, and we can help you out further.

Thanks.

ZunpengLiu commented 1 year ago

Thank you very much for your kind and prompt reply! Your suggestions are very instructive and helpful! I will give it a try based on your suggestion.

Best, Zunpeng

Hi Zunpeng,

Thank you for your interest in the software. Do you want to create an index that has both the RepeatMasker and your own annotations? This is probably preferable, as you would want to quantify your custom annotations alongside the other TE. If so, I would recommend the following steps:

## You want to exclude overlapping exonic annotations from the RepeatMasker GTF
$ awk '$3=="exon"' hg19_HERV.gtf > hg19_HERV_exons.gtf
## Removing overlapping exons
$ bedtools intersect -s -v -a hg19_rmsk_TE.gtf -b hg19_HERV_exons.gtf > hg19_rmsk_filtered.gtf
## Edit your custom GTF to include tags for family_id and class_id
$ sed -i 's/repClass/class_id/;s/repFamily/family_id/' hg19_HERV_exons.gtf
## Make your "final" GTF
$ cat hg19_rmsk_filtered.gtf hg19_HERV_exons.gtf | sort -k1,1 -k4,5n > hg19_custom_repeat.gtf
## Run our TElocal indexer (assuming that you have already installed TElocal somewhere on the system). 
## Please note that this can take days to run.
$ ./TElocal_indexer --afile hg19_custom_repeat.gtf --itype TE

You can get bedtools here, and our indexer here. Our RepeatMasker GTF files are here: (hg19 and hg38).

Let me know if you encounter any issues, and we can help you out further.

Thanks.