merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
436 stars 144 forks source link

anvi-estimate-metabolism and --kegg-data-dir fails with --external-genomes Config Error #1461

Closed jarrodscott closed 4 years ago

jarrodscott commented 4 years ago

Hi!

Odd little problem. For some reason when I run anvi-estimate-metabolism --external-genomes external-genomes.txt -O output.txt --kegg-data-dir /PATH/to/DB I get an error. See below for the error with --debug flag. The odd thing is that I can run the command every other way with --kegg-data-dir. For example, using one of the genomes from the --external-genomes file works fine like this: anvi-estimate-metabolism -c _Arcobacter_porcinus_117434-contigs.db --kegg-data-dir /PATH/to/DB -O output.txt (INPUT 1). Works with contig.db in metagenome mode (INPUT 1), with a profile.db and a collection/bin (INPUT 2).

I tried setting path variables before running the command (KEGG, kegg_data_dir, kegg_modules_db_path, kegg_modules_db, etc)...No luck. Always defaults back to data/misc/KEGG. Although I am not sure this would ever work :)

The only way I could get it to work was changing line 207 of kegg.py from data/misc/KEGG to my db path.

If this is really a bug, sorry I can't offer a real solution :/

:: anvi'o v6 master ::  >>> anvi-estimate-metabolism -e arco-phylo-external-genomes.txt  --kegg-data-dir /pool/genomics/stri_istmobiome/dbs/kegg-kofams/ -O arco-phylo-esitimate-metabolism.txt --debug

Num Contigs DBs in file ......................: 76
Metagenome Mode ..............................: False
Completeness threshold: single estimator .....: 0.75
[03 Jul 20 18:57:23 Estimating metabolism for contigs DBs] [1 of 76] _Arcobacter_porcinus_117434                                                                           ETA: ∞:∞:∞
✖ anvi-estimate-metabolism encountered an error after 0:00:21.187645

Traceback for debugging
================================================================================
  File "/home/scottjj/github/anvio/bin/anvi-estimate-metabolism", line 92, in <module>
    main(args)
  File "/home/scottjj/github/anvio/anvio/terminal.py", line 748, in wrapper
    program_method(*args, **kwargs)
  File "/home/scottjj/github/anvio/bin/anvi-estimate-metabolism", line 39, in main
    m.estimate_metabolism()
  File "/home/scottjj/github/anvio/anvio/kegg.py", line 2916, in estimate_metabolism
    kegg_metabolism_superdict_multi, ko_hits_superdict_multi = self.get_metabolism_superdict_multi()
  File "/home/scottjj/github/anvio/anvio/kegg.py", line 2737, in get_metabolism_superdict_multi
    metabolism_super_dict[metagenome_name], ko_hits_super_dict[metagenome_name] = KeggMetabolismEstimator(args, progress=progress_quiet, run=run_quiet).estimate_metabolism(skip_storing_data=True)
  File "/home/scottjj/github/anvio/anvio/kegg.py", line 1300, in __init__
    "you now know what you need to do to make this message go away." % ("MODULES.db", self.kegg_data_dir))
================================================================================

Config Error: It appears that a modules database (MODULES.db) does not exist in the KEGG data
              directory /home/scottjj/github/anvio/anvio/data/misc/KEGG. Perhaps you need to
              specify a different KEGG directory using --kegg-data-dir. Or perhaps you didn't
              run `anvi-setup-kegg-kofams`, though we are not sure how you got to this point
              in that case since you also cannot run `anvi-run-kegg-kofams` without first
              having run KEGG setup. But fine. Hopefully you now know what you need to do to
              make this message go away.
 :: anvi'o v6 master ::  ~ >>> anvi-self-test --version
Anvi'o version ...............................: esther (v6.2-master)
Profile DB version ...........................: 34
Contigs DB version ...........................: 18
Pan DB version ...............................: 14
Genome data storage version ..................: 7
Auxiliary data storage version ...............: 2
Structure DB version .........................: 2
Kegg Modules DB version ......................: 2
ivagljiva commented 4 years ago

@jarrodscott You have indeed stumbled upon a bug! Thank you very much for bringing this to our attention. :)

The error occurred because I forgot to initialize the --kegg-data-dir parameter in multiple input cases. Rather silly, but it is now fixed, so if you update your master branch and try again, I believe it should work now.

Please let me know if you have any further issues!

jarrodscott commented 4 years ago

@ivagljiva success! Thanks for getting this sorted out soooooo fast :)