Closed ONAgaganb closed 1 year ago
Hi,
Thank you for your interest in the software.
Unfortunately, our perl script is not designed for raw GFF3, as the INFO
field (column 9) is too highly variable to easily parse. You need to convert the GFF into a format where the TE name, class and family are in separate columns (so preprocess column 9).
$ sed 's/;/<tab>/g' Bombus_terrestris.fa.mod.EDTA.TEanno.gff3 | \
cut -d "<tab" -f 1-8,10,11 | \
sed 's/Name=//;s/Classification=//;\//<tab>/;' | \
awk 'BEGIN{FS="\t";OFS="\t"}; $1~/^#/; $1!~/^#/ && NF==11; $1!~/^#/ && NF<11{$11=$10;print}' \
> preprocessed.txt
$ perl makeTEgtf.pl -c 1 -s 4 -e 5 -o 7 -n EDTA -t 9 -f 11 -C 10 -S 6 -1 preprocessed.txt \
> Bombus_terrestris.fa.mod.EDTA.TEanno.gtf
You will get multiple warnings that there are lines skipped. This is because those entries did not have any strand information (.
instead of either +
or -
), and would confuse the software if it's trying to handle stranded RNAseq libraries.
I have attached the preprocessed text file and the GTF here.
Thanks
Thank you very much for your help! It works successfully now, thanks again!
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days
Hello, I feel honored to be using such a remarkable tool, but I have so far encountered the following problem that is causing me a lot of frustration. I got the annotation file in GFF3 format using EDTA, the file is Bombus_terrestris.fa.mod.EDTA.TEanno.gff3, and I would like to use the makeTEgtf.pl that you provided to convert this file into the GTF format that is required for TE to input the TEtranscript for the next TE Differential expression analysis, but the conversion process went wrong, I got a GTF file with duplicated content, such as this: B11 2 exon 16357640 16357868 1822 + . gene_id "ID=TE_homo_32101;Name=TE_00000590;Classification=DNA/Helitron;Sequence_ontology=SO:0000544;Identity=0.934;Method=homology"; transcript_id "ID=TE_homo_32101;Name=TE_00000590;Classification=DNA/Helitron;Sequence_ontology=SO:0000544;Identity=0.934;Method=homology"; family_id "ID=TE_homo_32101;Name=TE_00000590;Classification=DNA/Helitron;Sequence_ontology=SO:0000544;Identity=0.934;Method=homology"; class_id "ID=TE_homo_32101;Name=TE_00000590;Classification=DNA/Helitron;Sequence_ontology=SO:0000544;Iden^Ctity=0.934;Method=homology"; gene_name "ID=TE_homo_32101;Name=TE_00000590;Classification=DNA/Helitron;Sequence_ontology=SO:0000544;Identity=0.934;Method=homology:TE";
obviously this is not right, but I have no clue how to change it, here are the commands and parameters I used, as well as my GFF3 formatted file, kindly look forward to a reply, I am very grateful for your help!
the commands and parameters I used: nohup perl makeTEgtf.pl -c 1 -s 4 -e 5 -o 7 -n 2 -t 9 -f 9 -C 9 -S 6 -1 Bombus_terrestris.fa.mod.EDTA.TEanno.gff3 > log_makeGTF & [1] 21091
Translated with www.DeepL.com/Translator (free version)