Closed pillailab closed 3 years ago
Hi,
Thank you for your interest in the software. Based on the error, it appears that there might be an issue with the TE GTF file, where it cannot obtain the length of one (or more) of the elements. Would you be willing to share the custom TE GTF? We can see if we could find the potential error.
Thanks.
Thanks Oliver, how would you like to share it ( the gzip file is 105 Mb) ? Here are the first few lines: chr1 DFAM_hg38 exon 10955 10464 . - . gene_id "TAR1"; transcript_id "TAR1"; family_id "TAR1"; class_id "TAR1"; chr1 DFAM_hg38 exon 11464 10826 . - . gene_id "TAR1"; transcript_id "TAR1_dup1"; family_id "TAR1"; class_id "TAR1"; chr1 DFAM_hg38 exon 11677 11502 . - . gene_id "L1MC4a_3end"; transcript_id "L1MC4a_3end"; family_id "L1MC4a_3end"; class_id "L1MC4a_3end"; chr1 DFAM_hg38 exon 11781 11677 . - . gene_id "MER5B"; transcript_id "MER5B"; family_id "MER5B"; class_id "MER5B"; chr1 DFAM_hg38 exon 15354 15265 . - . gene_id "MIR1_Amn"; transcript_id "MIR1_Amn"; family_id "MIR1_Amn"; class_id "MIR1_Amn"; chr1 DFAM_hg38 exon 16460 16362 . - . gene_id "Charlie15a"; transcript_id "Charlie15a"; family_id "Charlie15a"; class_id "Charlie15a"; chr1 DFAM_hg38 exon 18419 18649 . + . gene_id "L1M2b_5end"; transcript_id "L1M2b_5end"; family_id "L1M2b_5end"; class_id "L1M2b_5end"; chr1 DFAM_hg38 exon 18909 19049 . + . gene_id "L2b_3end"; transcript_id "L2b_3end"; family_id "L2b_3end"; class_id "L2b_3end"; chr1 DFAM_hg38 exon 18958 19048 . + . gene_id "L2a_3end"; transcript_id "L2a_3end"; family_id "L2a_3end"; class_id "L2a_3end"; chr1 DFAM_hg38 exon 19973 20727 . + . gene_id "L3"; transcript_id "L3"; family_id "L3"; class_id "L3"; chr1 DFAM_hg38 exon 21951 22344 . + . gene_id "MLT1K"; transcript_id "MLT1K"; family_id "MLT1K"; class_id "MLT1K"; chr1 DFAM_hg38 exon 23362 23121 . - . gene_id "MIR"; transcript_id "MIR"; family_id "MIR"; class_id "MIR"; chr1 DFAM_hg38 exon 23838 24010 . + . gene_id "L2b_3end"; transcript_id "L2b_3end_dup1"; family_id "L2b_3end"; class_id "L2b_3end"; chr1 DFAM_hg38 exon 24074 24249 . + . gene_id "MIR"; transcript_id "MIR_dup1"; family_id "MIR"; class_id "MIR"; chr1 DFAM_hg38 exon 24262 24447 . + . gene_id "L2a_3end"; transcript_id "L2a_3end_dup1"; family_id "L2a_3end"; class_id "L2a_3end"; chr1 DFAM_hg38 exon 25285 25182 . - . gene_id "MIRb"; transcript_id "MIRb"; family_id "MIRb"; class_id "MIRb"; chr1 DFAM_hg38 exon 25893 25959 . + . gene_id "MIR"; transcript_id "MIR_dup2"; family_id "MIR"; class_id "MIR"; chr1 DFAM_hg38 exon 26358 26428 . + . gene_id "MIR1_Amn"; transcript_id "MIR1_Amn_dup1"; family_id "MIR1_Amn"; class_id "MIR1_Amn"; chr1 DFAM_hg38 exon 26775 26574 . - . gene_id "L2c_3end"; transcript_id "L2c_3end"; family_id "L2c_3end"; class_id "L2c_3end"; chr1 DFAM_hg38 exon 26792 27062 . + . gene_id "AluSp"; transcript_id "AluSp"; family_id "AluSp"; class_id "AluSp"; chr1 DFAM_hg38 exon 27221 27058 . - . gene_id "L2c_3end"; transcript_id "L2c_3end_dup1"; family_id "L2c_3end"; class_id "L2c_3end"; chr1 DFAM_hg38 exon 27283 27541 . + . gene_id "MER33"; transcript_id "MER33"; family_id "MER33"; class_id "MER33"; chr1 DFAM_hg38 exon 27987 27833 . - . gene_id "MIRc"; transcript_id "MIRc"; family_id "MIRc"; class_id "MIRc"; chr1 DFAM_hg38 exon 28301 28151 . - . gene_id "MIRb"; transcript_id "MIRb_dup1"; family_id "MIRb"; class_id "MIRb"; chr1 DFAM_hg38 exon 29902 30196 . + . gene_id "L1MB3_3end"; transcript_id "L1MB3_3end"; family_id "L1MB3_3end"; class_id "L1MB3_3end"; chr1 DFAM_hg38 exon 30533 30343 . - . gene_id "MER53"; transcript_id "MER53"; family_id "MER53"; class_id "MER53";
Hi,
Thanks for sharing the excerpt. The first thing that I noticed is that for all the annotations on the -
strand, the start position is larger than the stop position. I'm sure that this was the output from Dfam, but given that there is an expectation that the stop position is larger than the start position (at least based on my understanding of GTF), the length calculation would now give negative values. This is what is causing the error.
I would recommend trying to fix that, and see if that resolves the error.
Thanks.
Thank you, that indeed was the error, i had not checked the stop/ start from the gtf since it was generated by the perl script. Flipping the coordinates for the negative strands fixed it. Manoj
Hello, I have previously run TEtranscripts succesfully on our cluster using TE GTF from your website. I am now trying with a new gtf generated from dfam (generared by the makeTEgtf.pl script. However, it is throwing the following error:
Thank your for your help.