mhammell-laboratory / TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.
http://hammelllab.labsites.cshl.edu/software/#TEtranscripts
GNU General Public License v3.0
206 stars 29 forks source link

GTF annotation to use for TE #173

Closed sanhe374 closed 5 months ago

sanhe374 commented 5 months ago

I am using the Hg38 reference from UCSC: GCA_000001405.15_GRCh38_no_alt_analysis_set.fa and I was wondering which GTF annotation for TE I should be using together with this genome version?

olivertam commented 5 months ago

Hi,

Thank you for your interest in the software. It's unclear where you obtained that reference, as UCSC typically calls their file hg38 (rather than GRCh38). The best way to check is to look at the chromosome names in the FASTA file:

$ grep ">" GCA_000001405.15_GRCh38_no_alt_analysis_set.fa

If all chromosome names start with chr, then you should use the hg38 one If none of the chromosomes start with chr, then you should use the GRCh38 Ensembl one If the canonical chromosomes (e.g. chr1 to chr22 & chrX/Y/M) starts with chr, but the others don't (e.g. GL000256), then you should use the GRCh38 GENCODE one.

If none of them matches up, please let me know what I could download the file to take a closer look.

Thanks.

sanhe374 commented 5 months ago

Thank you for your quick reply. It seems like all chromosomes start with chr so will use the hg38 one.