Closed xvazquezc closed 1 year ago
I've been looking further into this and basically most annotation sources are ignored and not passed to annotations_misc/all.annotations.txt
. For what I could see, this affects (rows not appearing in annotations_misc/all.annotations.txt
) the GO terms, EggNOG, COG, SMCOG, InterPro, SECRETED, name and product. the individual annotations_misc/annotations.*.txt
files are generated without problem.
Hi there @xvazquezc. Did you end up figuring a workaround for this? I'm getting the same bug when I am trying to use Funannotate for GenBank genomes with gene predictions but lacking functional annotations. I don't really want to perform the prediction steps as that may reduce accuracy. Its a little frustrating that all the information is just sitting there but not being parsed
@kellystyles kinda my situation too. Unfortunately I didn't follow through...
funannotate compare is only going to output data properly if all genomes you are comparing have had functional annotation added with funannotate annotate. So if you have a public genome that's fine, add functional annotation to it with funannotate annotate and use that resulting annotated GBK file for compare.
@nextgenusfs the problem is that funannotate annotate
doesn't add the annotations if your genome comes from an externally gene-called genome. All the annotation sources are there but they are not passed to annotations_misc/all.annotations.txt
as I mention above , and as such they are not incorporated in the gbk files
It must be something specific with a genbank file you are using, sometimes old genbank files have locus tags that are problematic.
here is an example
$ funannotate annotate --genbank GCF_000149615.1_ASM14961v1_genomic.gbff -o aterreus --cpus 7
-------------------------------------------------------
[May 04 06:18 PM]: OS: MacOSX 10.16, 8 cores, ~ 17 GB RAM. Python: 3.7.12
[May 04 06:18 PM]: Running 1.8.15
[May 04 06:18 PM]: No NCBI SBT file given, will use default, however if you plan to submit to NCBI, create one and pass it here '--sbt'
[May 04 06:18 PM]: Checking GenBank file for annotation
Skipped 3 annotations: 3 pseudo genes; 0 no CDS; 0 duplicated features
[May 04 06:18 PM]: Adding Functional Annotation to Aspergillus terreus NIH2624, NCBI accession: WGS:AAJN
[May 04 06:18 PM]: Annotation consists of: 10,551 gene models
[May 04 06:18 PM]: 10,401 protein records loaded
[May 04 06:18 PM]: Running HMMer search of PFAM version 35.0
[May 04 06:24 PM]: 12,937 annotations added
[May 04 06:24 PM]: Running Diamond blastp search of UniProt DB version 2022_04
[May 04 06:26 PM]: 892 valid gene/product annotations from 1,673 total
[May 04 06:26 PM]: Running Eggnog-mapper
[May 04 07:24 PM]: Parsing EggNog Annotations
[May 04 07:24 PM]: EggNog version parsed as 2.1.6
[May 04 07:24 PM]: 20,696 COG and EggNog annotations added
[May 04 07:24 PM]: Combining UniProt/EggNog gene and product names using Gene2Product version 1.84
[May 04 07:24 PM]: 2,681 gene name and product description annotations added
[May 04 07:24 PM]: Running Diamond blastp search of MEROPS version 12.0
[May 04 07:24 PM]: 353 annotations added
[May 04 07:24 PM]: Annotating CAZYmes using HMMer search of dbCAN version 11.0
[May 04 07:25 PM]: 546 annotations added
[May 04 07:25 PM]: Annotating proteins with BUSCO dikarya models
[May 04 07:26 PM]: 1,154 annotations added
[May 04 07:26 PM]: Skipping phobius predictions, try funannotate remote -m phobius
[May 04 07:26 PM]: Predicting secreted proteins with SignalP
[May 04 07:31 PM]: 977 secretome and 0 transmembane annotations added
[May 04 07:31 PM]: InterProScan error, aterreus/annotate_misc/iprscan.xml is empty, or no XML file passed via --iprscan. Functional annotation will be lacking.
[May 04 07:31 PM]: Found 0 duplicated annotations, adding 42,025 valid annotations
[May 04 07:31 PM]: Detected NCBI reannotation, but couldn't locate p2g file, please pass via --p2g
[May 04 07:31 PM]: Converting to final Genbank format, good luck!
[May 04 07:32 PM]: Creating AGP file and corresponding contigs file
[May 04 07:32 PM]: Writing genome annotation table.
[May 04 07:32 PM]: Funannotate annotate has completed successfully!
We need YOUR help to improve gene names/product descriptions:
0 gene/products names MUST be fixed, see aterreus/annotate_results/Gene2Products.must-fix.txt
1 gene/product names need to be curated, see aterreus/annotate_results/Gene2Products.need-curating.txt
7 gene/product names passed but are not in Database, see aterreus/annotate_results/Gene2Products.new-names-passed.txt
Please consider contributing a PR at https://github.com/nextgenusfs/gene2product
And then parsing it through compare to show you it works...
$ funannotate compare -i aterreus/annotate_results/Aspergillus_terreus_NIH2624_NIH2624.gbk -o aterreus_compare
-------------------------------------------------------
[May 04 07:40 PM]: OS: MacOSX 10.16, 8 cores, ~ 17 GB RAM. Python: 3.7.12
[May 04 07:40 PM]: Running 1.8.15
[May 04 07:40 PM]: Now parsing 1 genomes
[May 04 07:40 PM]: working on Aspergillus terreus NIH2624
[May 04 07:40 PM]: No secondary metabolite annotations found
[May 04 07:40 PM]: Summarizing PFAM domain results
[May 04 07:40 PM]: Summarizing InterProScan results
[May 04 07:40 PM]: Loading InterPro descriptions
[May 04 07:40 PM]: Summarizing MEROPS protease results
[May 04 07:40 PM]: Summarizing CAZyme results
[May 04 07:40 PM]: Summarizing COG results
[May 04 07:40 PM]: Summarizing secreted protein results
[May 04 07:40 PM]: Summarizing fungal transcription factors
[May 04 07:40 PM]: No transcription factor IPR domains found
[May 04 07:40 PM]: Compiling all annotations for each genome
[May 04 07:40 PM]: Skipping RAxML phylogeny as at least 4 taxa are required
[May 04 07:40 PM]: Compressing results to output file: aterreus_compare.tar.gz
[May 04 07:40 PM]: Funannotate compare completed successfully!
And the resulting web output:
All the genomes I was re-annotating were ca. 2020. Noticed some of the files were partially annotated. Would that be an issue?
I found the issue. For the genomes I was re-annotating, I got the protein files from NCBI. Funannotate uses the locus_tag to create the IDs of both genes and prots, NCBI doesn't. So all the tools I ran externally couldn't match the IDs.
I wasn't aware of some of the funannotate util
that help dealing with this.
@kellystyles check if you did something like that
Are you using the latest release? Currently in 1.8.11
Describe the bug
funannotate annotate
doesn't parse the COG annotations into the gbk file output if the input is a gbk file, but it works without problem if given the "predict" folder. I'm reannotating some genomes from GenBank to compare with my own and I found out because when runningfunannotate compare
I would get the same error as reported in #682, asfunannotate annotate
doesn't give any error. The eggnog-mapper input seems to be parsed without problems COG annotation entries inannotate_misc/annotations.eggnog.txt
are there and look normal.What command did you issue?
Logfiles
funannotate annotate
doesn't throw any error, based on the log all looks OK, it just come up if you examine the files or runfunannotate compare
as mentioned above.OS/Install Information