merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
415 stars 142 forks source link

DAStool finishes without errors but output not recgnized by Anvi'o #2160

Closed eneas01 closed 8 months ago

eneas01 commented 8 months ago

Short description of the problem

Please help. DAS-tool is not creating critical output file "OUTPUT_DASTool_scaffolds2bin.txt", but finishes without errors in logs.txt, and creates "OUTPUT_DASTool_contig2bin.tsv", which is not recognized by Anvi'o.

anvi'o version

Keep the header of this section, but replace this text with the output of this command in your terminal:

Anvi'o .......................................: hope (v7.1)

Profile database .............................: 38
Contigs database .............................: 20
Pan database .................................: 15
Genome data storage ..........................: 7
Auxiliary data storage .......................: 2
Structure database ...........................: 2
Metabolic modules database ...................: 2
tRNA-seq database ............................: 2

System info

Distributor ID: Ubuntu Description: Ubuntu 22.04.2 LTS Release: 22.04 Codename: jammy

Anvi'o was installed using Anaconda3 DAS Tool 1.1.6

Detailed description of the issue

DAStool finishes without issues, but does not produce the expected file "OUTPUT_DASTool_scaffolds2bin.txt". Instead, I have "OUTPUT_DASTool_contig2bin.tsv", which seem to have the binning results. Therefore, the bins are not added to the databse, and I get an error instead. This is related to issue #1510, but not the same, as in this case DAStool did finish without errors, as far as I can tell (see logs.txt bellow).

Commands to reproduce the issue

Command that produced the problem:

 anvi-cluster-contigs -p 04_MAPPING_ANVIO/M22-MERGED/PROFILE.db \
                         -S concoct_bins,maxbin2_bins,metabat2_bins \
                         -c 05_CONTIGS/M22_contigs.db \
                         -C dastool_bins \
                         -T 60 \
                         --driver dastool \
                         --search-engine diamond \
                         --just-do-it

Output:

Config Error: One of the critical output files is missing                              
              ('OUTPUT_DASTool_scaffolds2bin.txt'). Please take a look at the log file:
              /tmp/tmpibqg3h47/logs.txt
cat  /tmp/tmpibqg3h47/logs.txt

Output from cat:

# DATE: 30 Oct 23 14:01:00
# CMD LINE: DAS_Tool -c /tmp/tmpibqg3h47/sequence_splits.fa -i /tmp/tmpibqg3h47/metabat2_bins.txt,/tmp/tmpibqg3h47/maxbin2_bins.txt,/tmp/tmpibqg3h47/concoct_bins.txt -l metabat2_bins,maxbin2_bins,concoct_bins -o /tmp/tmpibqg3h47/OUTPUT --threads 60 --search_engine diamond
DAS Tool 1.1.6 
Analyzing assembly 
Predicting genes 
Annotating single copy genes using diamond 
Dereplicating, aggregating, and scoring bins 

Hmmmm... no errors reported!

ls  /tmp/tmpibqg3h47

Output from ls:

concoct_bins-info.txt          OUTPUT_proteins.faa
concoct_bins.txt               OUTPUT_proteins.faa.all.b6
contig_coverages_log_norm.txt  OUTPUT_proteins.faa.archaea.scg
contig_coverages.txt           OUTPUT_proteins.faa.bacteria.scg
logs.txt                       OUTPUT_proteins.faa.findSCG.b6
maxbin2_bins-info.txt          OUTPUT_proteins.faa.scg.candidates.faa
maxbin2_bins.txt               OUTPUT.seqlength
metabat2_bins-info.txt         sequence_contigs.fa
metabat2_bins.txt              sequence_splits.fa
OUTPUT_DASTool_contig2bin.tsv  split_coverages_log_norm.txt
OUTPUT_DASTool.log             split_coverages.txt
OUTPUT_DASTool_summary.tsv
head /tmp/tmpibqg3h47/OUTPUT_DASTool_contig2bin.tsv

Output from head:

c_000000006057_split_00001  MAXBIN__040
c_000000007621_split_00001  MAXBIN__040
c_000000007723_split_00001  MAXBIN__040
c_000000008164_split_00001  MAXBIN__040
c_000000010277_split_00001  MAXBIN__040
c_000000014078_split_00001  MAXBIN__040
c_000000014541_split_00001  MAXBIN__040
c_000000017535_split_00001  MAXBIN__040
c_000000034075_split_00001  MAXBIN__040
c_000000034843_split_00001  MAXBIN__040

I think "OUTPUT_DASTool_contig2bin.tsv" seems to be the expected output and have the right format, but different name, am I wrong?

Lines 138-141 of dastool.py seem to check for 'OUTPUT_DASTool_scaffolds2bin.txt' and throw this error as it is not found. Can I just change it for "OUTPUT_DASTool_contig2bin.tsv" on line 138, or is this not the correct binning results file?

I could just run DASTool outside Anvi'o and import the collection, but as I am trying to automate a pipleline, I would prefer to stay within Anvi'o, if possible. ¿Can you please help me solve this issue? Any help will be greatly appreciated. Thanks in advance

meren commented 8 months ago

hi @eneas01, a8c0c550ea135414f5107659bfb6bd682812464a is an attempt to address this. if you install anvio-dev, you may be able to try it and see if this solves your problem.

In the worst case scenario you will need to import your binning results with anvi-import-collection, but I hope this change can help you.

eneas01 commented 8 months ago

Thanks a lot @meren. I will try with anvio-dev, and otherwise import the collection.

eneas01 commented 8 months ago

The change made to dastool.py in the anvio-dev version worked! Many thanks!