nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
300 stars 82 forks source link

RE: annotate is not adding GO terms to the gbk file #997

Open Malabady opened 5 months ago

Malabady commented 5 months ago

Hello, I am running the following command

funannotate annotate --gff Slycopersicum_225_iTAGv2.3.gene_exons.gff3 --fasta Slycopersicum_225_iTAGv2.40.fa -s 'Solanum lycopersicum' -o Sly --cpu 12 --iprscan iprscan.xml --database $FUNANNOTATE_DB --busco_db embryophyta

Basically, I am trying to generate a gbk file that can be used successfully in compare to my genome. the problem is, the produced gbk file has no GO terms although the iproscan.xml has them all. I see the following message in the annotate stdout:

`` [Jan 19 09:30 AM]: Parsing InterProScan5 XML file [Jan 19 09:38 AM]: Found 292,300 duplicated annotations, adding 0 valid annotations



the run stdout is as follows:

``
-------------------------------------------------------
[Jan 19 09:26 AM]: OS: Rocky Linux 8.8, 128 cores, ~ 528 GB RAM. Python: 3.8.15
[Jan 19 09:26 AM]: Running 1.8.15
[Jan 19 09:26 AM]: No NCBI SBT file given, will use default, however if you plan to submit to NCBI, create one and pass it here '--sbt'
[Jan 19 09:26 AM]: Found existing output directory Sly. Warning, will re-use any intermediate files found.
[Jan 19 09:26 AM]: Parsing annotation and preparing annotation files.
[Jan 19 09:27 AM]: Found 34,727 gene models from GFF3 annotation
[Jan 19 09:29 AM]: Adding Functional Annotation to Solanum lycopersicum, NCBI accession: None
[Jan 19 09:29 AM]: Annotation consists of: 34,727 gene models
[Jan 19 09:29 AM]: 34,727 protein records loaded
[Jan 19 09:29 AM]: Existing Pfam-A results found: Sly/annotate_misc/annotations.pfam.txt
[Jan 19 09:29 AM]: 38,921 annotations added
[Jan 19 09:29 AM]: Running Diamond blastp search of UniProt DB version 2023_05
[Jan 19 09:29 AM]: 7,101 valid gene/product annotations from 10,571 total
[Jan 19 09:29 AM]: Existing Eggnog-mapper results found: Sly/annotate_misc/eggnog.emapper.annotations
[Jan 19 09:29 AM]: Parsing EggNog Annotations
[Jan 19 09:29 AM]: EggNog version parsed as 2.1.12
[Jan 19 09:29 AM]: 65,960  COG and EggNog annotations added
[Jan 19 09:29 AM]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.92
[Jan 19 09:29 AM]: 8,027 gene name and product description annotations added
[Jan 19 09:29 AM]: Existing MEROPS results found: Sly/annotate_misc/annotations.merops.txt
[Jan 19 09:29 AM]: 1,076 annotations added
[Jan 19 09:29 AM]: Existing CAZYme results found: Sly/annotate_misc/annotations.dbCAN.txt
[Jan 19 09:29 AM]: 1,293 annotations added
[Jan 19 09:29 AM]: Existing BUSCO2 results found: Sly/annotate_misc/annotations.busco.txt
[Jan 19 09:29 AM]: 1,423 annotations added
[Jan 19 09:29 AM]: Skipping phobius predictions, try funannotate remote -m phobius
[Jan 19 09:29 AM]: Existing SignalP results found: Sly/annotate_misc/signalp.results.txt
[Jan 19 09:29 AM]: 2,997 secretome and 0 transmembane annotations added
[Jan 19 09:30 AM]: Parsing InterProScan5 XML file
[Jan 19 09:38 AM]: Found 292,300 duplicated annotations, adding 0 valid annotations
[Jan 19 09:38 AM]: Converting to final Genbank format, good luck!
[Jan 19 09:41 AM]: Creating AGP file and corresponding contigs file
[Jan 19 09:41 AM]: Writing genome annotation table.
[Jan 19 09:48 AM]: Funannotate annotate has completed successfully!

        We need YOUR help to improve gene names/product descriptions:
           0 gene/products names MUST be fixed, see Sly/annotate_results/Gene2Products.must-fix.txt
           18 gene/product names need to be curated, see Sly/annotate_results/Gene2Products.need-curating.txt
           2,801 gene/product names passed but are not in Database, see Sly/annotate_results/Gene2Products.new-names-passed.txt

        Please consider contributing a PR at https://github.com/nextgenusfs/gene2product
``

Any suggestions what's causing this issue and how to fix it? 

Many thanks
nextgenusfs commented 5 months ago

There are several files in the annotate_misc folder that contain 3 tab delimited columns with the data that is being added to the annotation. You should see a file named something like annotations.iprscan.txt -- have a look at that file and see if the first column (locus_tags) match that of your annotation. What protein fasta file did you run interproscan with?