metagenome-atlas / atlas

ATLAS - Three commands to start analyzing your metagenome data
https://metagenome-atlas.github.io/
BSD 3-Clause "New" or "Revised" License
368 stars 97 forks source link

Error in rule run_das_tool #345

Closed Ahmed-Shibl closed 3 years ago

Ahmed-Shibl commented 3 years ago

Hi, I was running atlas on a metagenomic dataset (CD151) and ran into this error.

rule run_das_tool:
    input: CD151/binning/DASTool/metabat.scaffolds2bin, CD151/binning/DASTool/maxbin.scaffolds2bin, CD151/CD151_contigs.fasta, CD151/annotation/predicted_genes/CD151.faa
    output: CD151/binning/DASTool/CD151_DASTool_summary.txt, CD151/binning/DASTool/CD151_DASTool_hqBins.pdf, CD151/binning/DASTool/CD151_DASTool_scores.pdf, CD151/binning/DASTool/CD151_metabat.eval, CD151/binning/DASTool/CD151_maxbin.eval, CD151/binning/DASTool/cluster_attribution.tsv
    log: CD151/logs/binning/DASTool.log
    jobid: 14
    wildcards: sample=CD151
    threads: 60
    resources: mem=60, time=5

Activating conda environment: /home/as11798/miniconda3/envs/atlas/CD151-processed/databases/conda_envs/31f62a84
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
[Fri Dec 11 00:52:52 2020]
Error in rule run_das_tool:
    jobid: 14
    output: CD151/binning/DASTool/CD151_DASTool_summary.txt, CD151/binning/DASTool/CD151_DASTool_hqBins.pdf, CD151/binning/DASTool/CD151_DASTool_scores.pdf, CD151/binning/DASTool/CD151_metabat.eval, CD151/binning/DASTool/CD151_maxbin.eval, CD151/binning/DASTool/cluster_attribution.tsv
    log: CD151/logs/binning/DASTool.log (check log file(s) for error message)
    conda-env: /home/as11798/miniconda3/envs/atlas/CD151-processed/databases/conda_envs/31f62a84
    shell:
         DAS_Tool --outputbasename CD151/binning/DASTool/CD151  --bins CD151/binning/DASTool/metabat.scaffolds2bin,CD151/binning/DASTool/maxbin.scaffolds2bin  --labels metabat,maxbin  --contigs CD151/CD151_contigs.fasta  --search_engine diamond  --proteins CD151/annotation/predicted_genes/CD151.faa  --write_bin_evals 1  --create_plots 1 --write_bin_evals 1  --megabin_penalty 0.5 --duplicate_penalty 0.6  --threads 60  --debug  --score_threshold 0.5 &> CD151/logs/binning/DASTool.log  ; mv CD151/binning/DASTool/CD151_DASTool_scaffolds2bin.txt CD151/binning/DASTool/cluster_attribution.tsv &>> CD151/logs/binning/DASTool.log
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job run_das_tool since they might be corrupted:
CD151/binning/DASTool/CD151_metabat.eval, CD151/binning/DASTool/CD151_maxbin.eval
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Note the path to the log file for debugging.
Documentation is available at: https://metagenome-atlas.readthedocs.io
Issues can be raised at: https://github.com/metagenome-atlas/atlas/issues
Complete log: /home/as11798/miniconda3/envs/atlas/CD151-processed/.snakemake/log/2020-12-10T091806.887264.snakemake.log
[2020-12-11 00:52 CRITICAL] Command 'snakemake --snakefile /home/as11798/miniconda3/envs/atlas/lib/python3.6/site-packages/atlas/Snakefile --directory /home/as11798/miniconda3/envs/atlas/CD151-processed --jobs 60 --rerun-incomplete --configfile '/home/as11798/miniconda3/envs/atlas/CD151-processed/config.yaml' --nolock   --use-conda --conda-prefix /home/as11798/miniconda3/envs/atlas/CD151-processed/databases/conda_envs   all   ' returned non-zero exit status 1.

Here's some background info that might be of help: 1- I added a list made of 19 host genomes (symbiodinium + coral) to the config.yml file prior to running the command atlas run all -w ~/miniconda3/envs/atlas/CD151-processed --jobs 60

2- The last few lines of the ~/miniconda3/envs/atlas/CD151-processed/CD151/logs/binning/DASTool.log file read as follows:

Loading query sequences...  [0s]
Closing the input file...  [0s]
Closing the output file...  [0s]
Closing the database file...  [0.003s]
Deallocating taxonomy...  [0s]
Total time = 0.49s
Reported 4 pairwise alignments, 4 HSPs.
4 queries aligned.
ESC[1;33mThe host system is detected to have 540 GB of RAM. It is recommended to increase the block size for better performance using these parameters : -b12 -c1
ESC[0;39mstarting annotations of single copy cogs...
successfully finished
calculating contig lengths.
evaluating bin-sets
No bins with bin-score >0.5 found. Adjust score_threshold to report bins with lower quality.
 Aborting.

mv: cannot stat 'CD151/binning/DASTool/CD151_DASTool_scaffolds2bin.txt': No such file or directory

3- The last few lines of the ~/miniconda3/envs/atlas/CD151-processed/CD151/logs/binning/maxbin.log file read as follows:

========== Job finished ==========
Yielded 2 bins for contig (scaffold) file CD151/CD151_contigs.fasta

Here are the output files for this run.
Please refer to the README file for further details.

Summary file: CD151/binning/maxbin/intermediate_files/CD151.summary
Marker counts: CD151/binning/maxbin/intermediate_files/CD151.marker
Marker genes for each bin: CD151/binning/maxbin/intermediate_files/CD151.marker_of_each_gene.tar.gz
Bin files: CD151/binning/maxbin/intermediate_files/CD151.001.fasta - CD151/binning/maxbin/intermediate_files/CD151.002.fasta
Unbinned sequences: CD151/binning/maxbin/intermediate_files/CD151.noclass

========== Elapsed Time ==========
0 hours 3 minutes and 11 seconds.

4- There is no ~/miniconda3/envs/atlas/CD151-processed/CD151/logs/binning/metabat.log but there is a ~/miniconda3/envs/atlas/CD151-processed/CD151/logs/binning/metabat.txt that reads: ```

MetaBAT 2 (2.14 (Bioconda)) using minContig 1500, minCV 1.0, minCVSum 1.0, maxP 95%, minS 60, and maxEdges 500. 
4 bins (72714056 bases in total) formed.

Please let me know if you need any other info. Thanks!!

Ahmed-Shibl commented 3 years ago

I came across [#249] and it seems that No bins with bin-score >0.5 found. Adjust score_threshold to report bins with lower quality. might be the problem. Or am I missing other sources of the error?

Also, what do you think is the most accurate way to get stats in order to know how many reads remained in my dataset after removing those mapped to the host/contamination genomes? e.g. Total reads before host removal= 456, total reads after host removal= 123, and total reads mapped to host genomes= 333..

Thanks,

SilasK commented 3 years ago

The problem you have is that you get bins out of it but they are too low quality. There some small changes you can do but I fear they may not solve your problem.

SilasK commented 3 years ago

To see the stats check the QC report in reports directory. You also find the assembly report, at the bottom of this report shows how many reads map to your assembly. what would be the average in your case?

SilasK commented 3 years ago

For the small things, you can do is, yes decrease the score threshold or use metabat as final_ binner.

Sometimes genomes have bacterial fragments in them so they remove too much reads. see https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbmask-guide/

If your assembly is good, than you can try to use co-binning. e.g. map the reads from all samples to each assembly which can help with the binning.

SilasK commented 3 years ago

If you see that your microbiome is too complicated to be assembled and binned (this is what I fear).

I suggest you work on the genes with 'atlas run genecatalog' and then this command #276 Alternatively, you can also use co-assembly which is not implemented in atlas but for example in anvi'o.

I hope this was helpful.

botellaflotante commented 3 years ago

I ran into the same type of errors, sometimes with maxbin not giving any bin, sometimes with no bins > 0.5 scores... Is there any easy way to rerun everything with the new config parameters (metabat, and lower scores thr)? I tryed with "atlas run binning -R run_das_tool" but did not work...

SilasK commented 3 years ago

https://github.com/metagenome-atlas/atlas/issues/314#issuecomment-748873170