mhammell-laboratory / TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.
http://hammelllab.labsites.cshl.edu/software/#TEtranscripts
GNU General Public License v3.0
217 stars 29 forks source link

Error when using custom GTF file #109

Closed MariaRig closed 2 years ago

MariaRig commented 2 years ago

Hello, I'm trying to run TEtranscripts using a custom GTF file and, after running some steps (processing alignments, optimizating and calculating the Total annotated/non-unique and unannotated reads), I get the following error that I don't know how to fix:

TE inconsistency! 11505-11675:L1MC5a:L1:LINE 
Error: 1 
[Exception type: SystemExit, raised in TEindex.py:380] 

Here is what my gtf looks like. The first line is where it crashes (in fact line 1785695 of the original GTF), which is the first line if I sort the gtf by position.

chr1    hg38_curated_OneCode    exon    11505   11675   .   -   .   gene_id "L1MC5a"; transcript_id "chr1:11505-11675"; family_id "L1"; class_id "LINE";
chr1    hg38_curated_OneCode    exon    76693   78235   .   +   .   gene_id "L1MC5a"; transcript_id "chr1:76693-78235"; family_id "L1"; class_id "LINE";
chr1    hg38_curated_OneCode    exon    86030   86619   .   +   .   gene_id "L1MC5a"; transcript_id "chr1:86030-86619"; family_id "L1"; class_id "LINE";

Thank you for your help.

Best,

Maria

olivertam commented 2 years ago

Hi Maria,

I think the issue is that there is a colon (:) in the transcript_id field. As the output of the annotation is delimited by colons, the presence of one in the transcript_id causes an error. It would be recommended to replace the colon with an underscore (_), which should not cause any additional issues.

Thanks.

MariaRig commented 2 years ago

Thank you very much, replacing the colons fixed the issue. Thank you for your time! Best wishes,

Maria