merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
421 stars 144 forks source link

[BUG] `--prodigal-single-mode` breaks metagenomic workflow #2305

Closed lmrodriguezr closed 1 month ago

lmrodriguezr commented 1 month ago

Short description of the problem

The recently introduced --prodigal-single-mode flag from https://github.com/merenlab/anvio/commit/f1291f755c859d55f8152321942f5a62b384676b causes workflows to fail with the following error:

Traceback (most recent call last):
  File "/opt/conda/envs/anvioenv/bin/anvi-gen-contigs-database", line 56, in <module>
    groupD.add_argument(*anvio.A('prodigal-single-mode'), **anvio.K('prodigal-single-mode'))
  File "/opt/conda/envs/anvioenv/lib/python3.10/site-packages/anvio/__init__.py", line 3549, in A
    return D[param_id][0]
KeyError: 'prodigal-single-mode'

Presumably this is because the JSON files are missing the anvi_gen_contigs_database.--prodigal-single-mode key, but unfortunately the anvi-run-workflow --get-default-config command doesn't include it either.

I tried the workaround of just adding anvi_gen_contigs_database.--prodigal-single-mode manually to the JSON (see the example in the files at the bottom), but that's not recognized:

Config Error: some of the parameters in your config file for rule anvi_gen_contigs_database   
              are not familiar to us. Here is a list of the wrong parameters: ['--prodigal-   
              single-mode']. The only acceptable parameters for this rule are ['--            
              description', '--skip-gene-calling', '--ignore-internal-stop-codons', '--skip-  
              mindful-splitting', '--contigs-fasta', '--project-name', '--description', '--   
              split-length', '--kmer-size', '--skip-mindful-splitting', '--skip-gene-calling',
              '--ignore-internal-stop-codons', '--skip-predict-frame', '--prodigal-           
              translation-table', 'threads'].                                                 

anvi'o version

I'm using the github HEAD (this change was introduced 3 weeks ago), but here is the output anyways:

Anvi'o .......................................: marie (v8)
Python .......................................: 3.10.8

Profile database .............................: 38
Contigs database .............................: 21
Pan database .................................: 16
Genome data storage ..........................: 7
Auxiliary data storage .......................: 2
Structure database ...........................: 2
Metabolic modules database ...................: 4
tRNA-seq database ............................: 2

System info

Rocky Linux 8.6 (Green Obsidian), using a SIF file with Singularity that I modified to Anvi'o HEAD. I cannot use the stable release because of a DAStool issue now solved in HEAD.

Detailed description of the issue

I believe the above (and below) cover all the details, but I'm happy to share the SIF if that helps.

Files / commands to reproduce the issue

singularity exec -B /scratch ~/data/apps/anvio/anvio_dev.sif anvi-run-workflow -w metagenomics -c metagenomics-config-1.json --additional-params --until anvi_cluster_contigs

Config files before and after "fix" (which didn't actually fix anything): https://gist.github.com/lmrodriguezr/c60708194da15a1902258e178258afe5

meren commented 1 month ago

Dear @lmrodriguezr, thank you very much for reporting this and apologies for the frustration.

I find this a bit confusing and I think the error is related to another problem with your anvi'o environment. The error mentions that the prodigal-single-mode flag is missing in the anvio/__init__.py, which indeed is not supposed to be there if you are using the stable v8. As you noted, these changes were made just recently and therefore they should only be present in anvio-dev branch, so the v8 version of anvi-gen-contigs-database should not require --prodigal-single-mode as a potential parameter (but it does, which makes me think that your libraries is from v8 but programs are from anvio-dev --which can only happen due to some sort of confusion in the anvi'o environment).

I double checked: anvi'o v8 and anvio-dev branches operate smoothly when I installed them to separate conda environments from scratch. Does this ring a bell? Could it be the case that your environment have files mixed from v8 and anvio-dev?

One likely solution that would solve this is to install your environment from scratch. I would suggest you to install the development version to track anvio-dev. If you run into a similar problem we can then quickly address it.

Best wishes, Meren

lmrodriguezr commented 1 month ago

Hi @meren Thank you! I had replaced the Anvi'o code for the dev version from GitHub in a VM, but I probably missed something in the process. I have now created a fresh Anvi'o environment using conda instead and I no longer have this issue, thank you!

BTW, installing anvio-dev I encountered this problem: https://github.com/snakemake/snakemake/issues/2607. The solution was very simple, just pip3 install pulp==2.7.0, so maybe this could be considered in requirements.txt?

It looks like I'm now facing a different problem, though. CONCOCT is no longer working, and it throws the error:

✖ anvi-cluster-contigs encountered an error after 0:02:25.812512

Config Error: One of the critical output files is missing ('clustering_gt1000.csv'). Please
              take a look at the log file: /tmp/tmp14re4_51/logs.txt                       

Is this a known issue? I'll keep trying :)

Thank you! Miguel.

lmrodriguezr commented 1 month ago

Oh, never mind, I found this: https://github.com/merenlab/anvio/issues/2154, and I'm now trying agin after pip install scikit-learn==1.1.0 :)

meren commented 1 month ago

The solution was very simple, just pip3 install pulp==2.7.0, so maybe this could be considered in requirements.txt?

This is now in the requirements.txt, @lmrodriguezr! Thank you. If scikit-learn==1.1.0 solves it we should also replace the current scikit==1.2.2 requirement, although I'm not sure whether something will go wrong after that :) Please let us know, and thank you!

lmrodriguezr commented 3 weeks ago

scikit=1.2.2 seems to work well, I have not encountered issues with it yet. Thank you @meren !