merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
427 stars 145 forks source link

[BUG] Missing USearch in installation instruction and workflow DAG declarations #2309

Open lmrodriguezr opened 1 month ago

lmrodriguezr commented 1 month ago

Short description of the problem

DAS Tool needs USearch, but it's not listed in the requirements.

anvi'o version

Anvi'o .......................................: marie (v8-dev)
Python .......................................: 3.10.8

Profile database .............................: 40
Contigs database .............................: 23
Pan database .................................: 17
Genome data storage ..........................: 7
Auxiliary data storage .......................: 2
Structure database ...........................: 2
Metabolic modules database ...................: 4
tRNA-seq database ............................: 2

System info

OS: Rocky Linux 8.6 (Green Obsidian). Installed using the instructions for developer version.

Detailed description of the issue

I'm not sure if this is an issue for the conda recipe of DAS Tool, or for Anvi'o. However, since Anvi'o explicitly uses --search_engine usearch, I'm reporting it here: The bioconda installation of DAS Tool does not include usearch (see recipe), and Anvi'o fails with the message:

Config Error: One of the critical output files is missing ('OUTPUT_DASTool_contig2bin.tsv').
              Please take a look at the log file: /tmp/tmpwmogo2i1/logs.txt                 

If you loose access to the temporal (e.g., in a cluster infrastructure) it would be pretty hard to debug. But the actual error is simply a missing usearch:

DAS Tool 1.1.6 
Error:  Cannot find dependencies: usearch 
Execution halted

Perhaps this could be documented somewhere in the installation? Or at least the workflow could be aware of that dependency, as it currently doesn't list it when building the DAG:

Shell programs for the workflow
===============================================
Needed .......................................: gunzip, anvi-script-reformat-fasta, anvi-script-reformat-fasta, anvi-gen-contigs-d
atabase, anvi-import-functions, anvi-get-sequences-for-gene-calls, centrifuge, anvi-import-taxonomy-for-genes, anvi-run-hmms, anvi
-run-pfams, anvi-run-kegg-kofams, anvi-run-ncbi-cogs, anvi-run-scg-taxonomy, anvi-scan-trnas, anvi-get-sequences-for-gene-calls, i
u-gen-configs, iu-filter-quality-minoche, gzip, bowtie2-build, bowtie2, samtools, anvi-init-bam, anvi-profile, echo, anvi-import-c
ollection, anvi-script-add-default-collection, anvi-summarize, anvi-split, mv, krakenuniq, krakenuniq-mpa-report, anvi-import-taxo
nomy-for-layers, anvi-cluster-contigs
Missing ......................................: None

In any case, the solution is pretty simple: install usearch :)

Thank you! Miguel.

meren commented 1 month ago

Dear @lmrodriguezr, I'm sorry you are running into issues with anvi-cluster-contigs :/

To be honest, we are often considering removing that program and the underlying structure completely from anvi'o. We had started that project with high hopes, but the diversity of binning algorithms, their changing input/output formats from one version to the next, and lack of proper APIs for almost ANY of them made us realize that perhaps it is best if the user does the automatic binning outside of anvi'o, and bring in their bins into the anvi'o system with anvi-import-collection for refinement efforts, or anything else downstream.

If we had someone interested in pushing the automatic binning capabilities of anvi'o, we would happily give them full access to the codebase so they could do whatever they wanted, fix the issues, and update documentation and so on. But currently every core developer is dealing with much more immediate needs, so anvi-cluster-contigs and workflows linked to it starts accumulating bugs as you notice.

Best wishes, Meren