mhammell-laboratory / TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.
http://hammelllab.labsites.cshl.edu/software/#TEtranscripts
GNU General Public License v3.0
206 stars 29 forks source link

TE GTF format error! There is no annotation at line 1. Error in building TE index #141

Closed DangPengsanqi closed 1 year ago

DangPengsanqi commented 1 year ago

Hi I'm having an issue with your software that I can't resolve. I created the gtf file according to the format, but I still get an error. image I provided the original GFF file and the GTF file I created. Would appreciate it if you could help me find the problem. 10000.gtf.txt 10000.gff.txt

olivertam commented 1 year ago

Hi,

Thank you for your interest in the software. The TE GTF file requires the family_id and class_id attributes. You can use your Classification attribute for both if you like.

Your current GTF

Gm01    EDTA    exon    10000085    10001527    8702    +   .   gene_id "TE_homo_23577"; transcript_id "TE_00002591_INT"; Classification "LTR/Gypsy"; Sequence_ontology "SO:0002265";
Gm01    EDTA    exon    10001463    10001833    515 -   .   gene_id "TE_homo_23578"; transcript_id "TE_00004490_INT"; Classification "LTR/unknown"; Sequence_ontology "SO:0000186";

Slight modification

Gm01    EDTA    exon    10000085    10001527    8702    +   .   gene_id "TE_homo_23577"; transcript_id "TE_00002591_INT"; family_id "Gypsy"; class_id "LTR"; Sequence_ontology "SO:0002265";
Gm01    EDTA    exon    10001463    10001833    515 -   .   gene_id "TE_homo_23578"; transcript_id "TE_00004490_INT"; family_Id "unknown"; class_id "LTR"; Sequence_ontology "SO:0000186";

You can also make the class_id and family_id the same. TEtranscripts just needs a value for both.

Thanks.

DangPengsanqi commented 1 year ago

Make modifications to the file as you suggested. The program can now continue to run, thank you very much