Closed jsnoliva closed 3 weeks ago
Hi,
Thank you for your interest in the software.
The reason why it's failing is because TEtranscripts expects the following two fields in the INFO portion: family_id
and gene_id
. They don't have to contain meaningful information (e.g. they could all be "Unknown"), but you could also convert your classification
information to the family_id
and class_id
if you like.
Thanks.
Thanks for your response, I changed classification to family_id but am still getting the same error.
nohup: ignoring input
INFO @ Tue, 17 Sep 2024 11:09:09:
# ARGUMENTS LIST:
# name = TEtranscripts_out
# treatment files = ['SRR17151206_sorted.bam', 'SRR17151214_sorted.bam', 'SRR17151215_sorted.bam']
# control files = ['SRR17151213_sorted.bam', 'SRR17151216_sorted.bam', 'SRR17151217_sorted.bam']
# GTF file = all.gtf
# TE file = fixed_OS_Rice_MSU7.fasta.mod.EDTA.TEanno.gtf
# multi-mapper mode = multi
# stranded = no
# differential analysis using DESeq2
# normalization = DESeq2_default
# FDR cutoff = 5.00e-02
# fold-change cutoff = 1.00
# read count cutoff = 1
# number of iteration = 100
# Alignments grouped by read ID = False
INFO @ Tue, 17 Sep 2024 11:09:09: Processing GTF files ...
INFO @ Tue, 17 Sep 2024 11:09:09: Building gene index .......
100000 GTF lines processed.
200000 GTF lines processed.
300000 GTF lines processed.
INFO @ Tue, 17 Sep 2024 11:10:05: Done building gene index ......
INFO @ Tue, 17 Sep 2024 11:10:06: Building TE index .......
Chr1 EDTA helitron 1133 1242 406 + . gene_id "TE_homo_0"; ID "TE_homo_0"; Name "Os1968"; family_id "DNAnona/Helitron"; identity "0.824"; method "homology"; sequence_ontology "SO:0000544";
TE GTF format error! There is no annotation at line 1.
Error in building TE index
This is what the beginning of mt .gtf file looks like
##gtf-version X
# GFF-like GTF i.e. not checked against any GTF specification. Conversion based on GFF input, standardised by AGAT.
##date Fri Aug 23 10:51:26 AM EDT 2024
##This file contains repeats annotated by EDTA v2.2.2 with both structural and homology methods. Repeats can be overlapping due to nested insertions.
##This file follows the ENSEMBL standard: https://useast.ensembl.org/info/website/upload/gff3.html
##Column 3: Sequence Ontology of repeat features. Please refer to the SO database for more details: http://www.sequenceontology.org/. In cases where the SO database does not have the repeat feature, tentative SO names are used, with a full list included in EDTA/bin/TE_Sequence_Ontology.txt (Enhancement notes), and the sequence_ontology in Column 9 uses the closest parent SO.
##Column 7: The Smith-Waterman score generated by RepeatMasker, only available for homology entries.
##Column 9:
## ID: unique ID for this feature in the genome.
## classification: Same as Column 3 but formatted following the RepeatMasker naming convention.
## sequence_ontology: Sequence Ontology ID of the feature.
## identity: Sequence identity (0-1) between the library sequence and the target region.
## ltr_identity: Sequence identity (0-1) between the left and right LTR regions for structurally annotated LTR elements.
## Name: Repeat family name. Some may be shown as coordinates, which are single-copy and structrually identified elements that are not included in the repeat library.
## method: Indicate if this entry is produced by structural annotation or homology annotation.
## motif/TSD/TIR: structural features of structurally annotated LTR and TIR elements.
##For more details about this file, please refer to the EDTA wiki: https://github.com/oushujun/EDTA/wiki/Making-sense-of-EDTA-usage-and-outputs---Q&A
##seqid source sequence_ontology start end score strand phase attributes
ChrSy AGAT gene 4 1181 . + . gene_id "agat-gene-677"; ID "agat-gene-677"; Name "Os0376_LTR"; family_id "LTR/Gypsy"; identity "0.776"; method "homology"; sequence_ontology "SO:0002265";
ChrSy EDTA Gypsy_LTR_retrotransposon 4 1181 3021 + . gene_id "agat-gene-677"; transcript_id "TE_homo_320354"; ID "TE_homo_320354"; Name "Os0376_LTR"; Parent "agat-gene-677"; family_id "LTR/Gypsy"; identity "0.776"; method "homology"; sequence_ontology "SO:0002265";
ChrSy AGAT gene 1254 1598 . + . gene_id "agat-gene-678"; ID "agat-gene-678"; Name "Os1598_LTR"; family_id "LTR/Gypsy"; identity "0.871"; method "homology"; sequence_ontology "SO:0002265";
ChrSy EDTA Gypsy_LTR_retrotransposon 1254 1598 2014 + . gene_id "agat-gene-678"; transcript_id "TE_homo_320355"; ID "TE_homo_320355"; Name "Os1598_LTR"; Parent "agat-gene-678"; family_id "LTR/Gypsy"; identity "0.871"; method "homology"; sequence_ontology "SO:0002265";
Hi,
You still need the class_id
field in the last column. You can either just use a placeholder (e.g. class_id "TE"
), or try to split the classification to two entries (e.g. classification "LTR/Gypsy"
to family_id "Gypsy"; class_id "LTR"
)
Thanks.
I tried both methods and now it seems to be a different line issue
100000 GTF lines processed.
200000 GTF lines processed.
300000 GTF lines processed.
INFO @ Tue, 17 Sep 2024 13:08:22: Done building gene index ......
INFO @ Tue, 17 Sep 2024 13:08:24: Building TE index .......
Chr1 EDTA helitron 1133 1242 406 + . gene_id "TE_homo_0"; ID "TE_homo_0"; Name "Os1968"; class_id "DNAnona"; family_id "Hel
itron"; identity "0.824"; method "homology"; sequence_ontology "SO:0000544";
TE GTF format error! There is no annotation at line 1.
Error in building TE index
100000 GTF lines processed.
200000 GTF lines processed.
300000 GTF lines processed.
INFO @ Tue, 17 Sep 2024 13:12:34: Done building gene index ......
INFO @ Tue, 17 Sep 2024 13:12:35: Building TE index .......
Chr1 EDTA helitron 1133 1242 406 + . gene_id "TE_homo_0"; class_id "TE"; ID "TE_homo_0"; Name "Os1968"; family_id "DNAnona/Helitron"; identity "0.824"; method "homology"; sequence_ontology "SO:0000544";
TE GTF format error! There is no annotation at line 1.
Error in building TE index
Hi,
Sorry, I just realized another issue. The third column should be exon
, since that's the entry that is recognized by TEtranscripts
to be used for annotation.
Apologies.
It seems to have solved the TE GTF format error but error in building TE index remains
INFO @ Tue, 24 Sep 2024 11:50:40: Processing GTF files ...
INFO @ Tue, 24 Sep 2024 11:50:40: Building gene index .......
100000 GTF lines processed.
200000 GTF lines processed.
300000 GTF lines processed.
INFO @ Tue, 24 Sep 2024 11:51:37: Done building gene index ......
INFO @ Tue, 24 Sep 2024 11:51:37: Building TE index .......
Error in building TE index
First lines in my file
Chr1 EDTA exon 1133 1242 406 + . gene_id "TE_homo_0"; family_id "Os1968"; class_id "DNAnona/Helitron";
Chr1 EDTA exon 1282 1422 352 + . gene_id "TE_homo_1"; family_id "TE_00001024"; class_id "DNA/Helitron";
Chr1 EDTA exon 1444 1780 919 + . gene_id "TE_homo_2"; family_id "TE_00006580"; class_id "DNA/Helitron";
Chr1 EDTA exon 1855 2027 843 - . gene_id "TE_homo_3"; family_id "TE_00006580"; class_id "DNA/Helitron";
Chr1 EDTA exon 1986 2199 1121 + . gene_id "TE_homo_4"; family_id "TE_00001024"; class_id "DNA/Helitron";
Chr1 EDTA exon 2297 2472 1332 - . gene_id "TE_homo_5"; family_id "Os0073"; class_id "DNAnona/unknown";
Chr1 EDTA exon 2536 2924 . . . gene_id "TE_struc_279"; family_id "Os1667"; class_id "MITE/DTH;
Chr1 EDTA exon 4579 4700 742 - . gene_id "TE_homo_6"; family_id "TE_00005294"; class_id "DNA/Helitron";
Chr1 EDTA exon 4794 5030 1083 - . gene_id "TE_homo_7"; family_id "TE_00005294"; class_id "DNA/Helitron";
Chr1 EDTA exon 5684 5886 . . . gene_id "TE_struc_280"; family_id "Os2924"; class_id "MITE/DTT;
Chr1 EDTA exon 8877 9129 693 - . gene_id "TE_homo_8"; family_id "TE_00000098"; class_id "DNA/DTT";
Chr1 EDTA exon 9034 9162 461 + . gene_id "TE_homo_9"; family_id "TE_00006547"; class_id "DNA/DTA";
Chr1 EDTA exon 11019 11107 461 - . gene_id "TE_homo_10"; family_id "Os2076"; class_id "DNAnona/MULE";
Hi,
Here are some common issues:
gene_id
, transcript_id
(that can be any unique value), family_id
and class_id
If you are still having issue, feel free to share the GTF and we can troubleshoot it further.
Thanks.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days
INFO @ Wed, 11 Sep 2024 16:26:23: Processing GTF files ...
INFO @ Wed, 11 Sep 2024 16:26:23: Building gene index .......
100000 GTF lines processed. 200000 GTF lines processed. 300000 GTF lines processed. INFO @ Wed, 11 Sep 2024 16:27:32: Done building gene index ......
INFO @ Wed, 11 Sep 2024 16:27:33: Building TE index .......
Chr1 EDTA helitron 1133 1242 406 + . gene_id "TE_homo_0"; transcript_id "TE_homo_0"; Name "Os1968"; classification "DNAnona/Helitron"; identity "0.824"; method "homology"; sequence_ontology "SO:0000544"; TE GTF format error! There is no annotation at line 1. Error in building TE index