merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
427 stars 145 forks source link

Update parser for output from EggNOG-mapper v2.1.4 #1757

Closed cpeeters closed 3 years ago

cpeeters commented 3 years ago

The need

The online eggNOG-mapper tool changed to version 2.1.4 (was 2.0.1 until few days ago) and Anvio does not support this version when trying to import these functions as described here: https://merenlab.org/2016/06/18/importing-functions/#eggnog-database--emapper (highest supported version is 2.0.1).

The solution

Would be possible to enable the import of eggNOG-mapper v2.1.4 output?

I realize it is a hassle that the output format of eggNOG-mapper keeps on changing, but as discussed in this post (https://github.com/eggnogdb..., it should remain the same on the mid term from now.

Beneficiaries

Everyone working with eggNOG-mapper and doing microbiol genomics.

Attached files

Please find in attachment an example of a contigs database for isolate DNF00083, as well as the output from eggNOG-mapper 2.1.4 as generated by the online tool (http://eggnog-mapper.embl.de/). A 'g' was added for to each line with gene annotation info so it should be ready for import.

eggnog-mapper-2.1.4.zip

meren commented 3 years ago

Thank you very much for the test files, @cpeeters! The commit d8e4af47deb67cf8adb3df53b0f113c64cf4015e addresses this in the development branch.

This is what I get when I run it on the contigs database you've sent:

>>> anvi-script-run-eggnog-mapper -c DNF00083.db \
                                  --annotation DNF00083.emapper.annotations.fixed \
                                  --use-version 2.1.4

COG version ..................................: COG20
COG data source ..............................: The anvi'o default.
COG base directory ...........................: /Users/meren/github/anvio/anvio/data/misc/COG

Gene functions ...............................: 11,278 function calls from 12 sources (EGGNOG_GENE_FUNCTION_NAME, EGGNOG_BRITE, EGGNOG_GO_TERMS, EGGNOG_KEGG_TC, EGGNOG_KEGG_PATHWAYS, EGGNOG_KEGG_MODULE,
                                                EGGNOG_BEST_TAX, EGGNOG_BiGG_REACTIONS, EGGNOG_EC_NUMBER, EGGNOG_CAZy, EGGNOG_KEGG_KO, EGGNOG_BACT) for 1,822 unique gene calls have been added to the contigs
                                                database.

Best wishes,

cpeeters commented 3 years ago

Thank you very much, worked like a charm!

Charlotte

meren commented 3 years ago

Brilliant. Thank you for reporting back!

cdv22222 commented 3 years ago

Hi Meren, I have this problem when I ran it

anvi-script-run-eggnog-mapper -c contigs.db --annotation 1.tsv --use-version 2.1.4
COG version ..................................: COG20
COG data source ..............................: The anvi'o default.
COG base directory ...........................: /home/z/anaconda3/envs/anvio-7/lib/python3.6/site-packages/anvio/data/misc/COG

Config Error: Gene caller ids found in this annotation file does not start with the expected prefix. This is a historical glitch that is not quite easy to address programmatically, so anvi'o asks you to add the expected prefix as the first character of every gene call in your annotations file. This is the prefix what you need to add manually to the very beginning of every line (anvi'o developers are very sorry for this step):'g'.``

But when i tryed to add 'g'. to every gene call, the result is as follows:

`Config Error: At least one gene caller id in this annotation file (gc_000000000001_1) does not
              look like how anvi'o likes its gene calls. Hint: what should remain after       
              removing gene caller id prefix (g) should be an integer value.`

Any advice please with language for dummies, because I am moron (⊙.⊙)

meren commented 3 years ago

I just responded to your other issue!

Any advice please with language for dummies, because I am moron (⊙.⊙)

I laughed out-loud reading this :) Thanks for your patience!