nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
322 stars 87 forks source link

Specifying eggnog db #106

Closed eyalbenda closed 7 years ago

eyalbenda commented 7 years ago

Reading the faq about running funannotate on other organisms, one of the things I've noticed is that I should specify "--eggnog_db". I downloaded the relevant database using "funannotate eggnog", but how do I specify which db to use, and at which step (predict or annotate)?

nextgenusfs commented 7 years ago

Hi @eyalbenda. The documents are out of date, sorry. When I get the time I need to update the docs. What version of funannotate are you using?

The EggNog databases have been dropped in the most current version of funannotate in favor of eggnog-mapper which is here. When I first started writing this there wasn't a method to map proteins to the EggNog database other than just doing a HMM search, so I had incorporated it into funannotate. However, now the EggNog developers have a nice tool to query their database. funannotate annotate can incorporate that data using the --eggnog flag.

Here is help menu for funannotate annotate:

Usage:       funannotate annotate <arguments>
version:     0.7.2

Description: Script functionally annotates the results from funannotate predict.  It pulls
             annotation from PFAM, InterPro, EggNog, UniProtKB, MEROPS, CAZyme, and GO ontology.

Required:    -i, --input        Folder from funannotate predict
          or
             --genbank          Genome in GenBank format
             -o, --out          Output folder for results
          or   
             --gff              Genome GFF3 annotation file
             --fasta            Genome in multi-fasta format
             --proteins         Genome proteins in multi-fasta format
             -s, --species      Species name, use quotes for binomial, e.g. "Aspergillus fumigatus"
             -o, --out          Output folder for results

Optional:    --sbt              NCBI submission template file. (Recommended)
             --eggnog           Eggnog-mapper annotations file.
             --antismash        antiSMASH secondary metabolism results, GBK file.
             --iprscan          InterProScan XML file
             --phobius          Phobius pre-computed results.
             --isolate          Isolate name, e.g. Af293
             --strain           Strain name
             --busco_db         BUSCO models. Default: dikarya
             -t, --tbl2asn      Additional parameters for tbl2asn. Example: "-l paired-ends"
             --force            Force over-write of output folder
             --cpus             Number of CPUs to use. Default: 2

ENV Vars:  By default loaded from your $PATH, however you can specify at run-time if not in PATH  
             --AUGUSTUS_CONFIG_PATH

Written by Jon Palmer (2016-2017) nextgenusfs@gmail.com
eyalbenda commented 7 years ago

Thanks! I'm using the latest one, and indeed I see that annotate does have the eggnog option. So if I understand correctly, I would need to manually download the relevant eggnog database using the mapper you link to? Then, which file specifically do I point to using the --eggnog option?

nextgenusfs commented 7 years ago

You can run the eggnog mapper tool on your protein fasta file (use the HMM search). There is a script in the eggnog-mapper distribution for downloading/maintaining the databases, more information

Then you would run something like the following:

emapper.py -i proteins.fa --output result -d fuNOG --cpu 12

This will produce output files described here. You would then pass the file result.emapper.annotations to the --eggnog option.