metagenome-atlas / atlas

ATLAS - Three commands to start analyzing your metagenome data
https://metagenome-atlas.github.io/
BSD 3-Clause "New" or "Revised" License
376 stars 98 forks source link

Error in rule align_reads_to_final_contigs #628

Closed jilldv closed 1 year ago

jilldv commented 1 year ago

Here is the relevant log output:

Activating conda environment: ../atlas_databases/conda_envs/f077e509b9873bc1010104c7314c8556_
Environment defines Python version < 3.7. Using Python of the main process to execute script. Note that this cannot be avoided, because the script uses data structures from Snakemake which are Python >=3.7 only.
Activating conda environment: ../atlas_databases/conda_envs/f077e509b9873bc1010104c7314c8556_
Traceback (most recent call last):
  File "/home/jill/atlas_test/.snakemake/scripts/tmp2k4zsstk.wrapper.py", line 13, in <module>
    from snakemake_wrapper_utils.samtools import infer_out_format
ModuleNotFoundError: No module named 'snakemake_wrapper_utils'
[Thu Apr 13 09:24:24 2023]
Error in rule align_reads_to_final_contigs:
    jobid: 36
    input: sample2/sequence_quality_control/sample2_QC_R1.fastq.gz, sample2/sequence_quality_control/sample2_QC_R2.fastq.gz, sample2/sample2_contigs.fasta
    output: sample2/sequence_alignment/sample2.bam
    log: sample2/logs/assembly/calculate_coverage/align_reads_from_sample2_to_filtered_contigs.log (check log file(s) for error details)
    conda-env: /home/jill/atlas_databases/conda_envs/f077e509b9873bc1010104c7314c8556_

Atlas version 2.15.0

Additional context I installed atlas using conda, following the documentation. Now I am trying to run the pipeline using the test data also provided in the documentation. But I encountered this error. When installing atlas into a conda environment I specified python version 3.8, but it seems there is some kind of mixup between versions. When I run conda list python -f in the activated atlas environment I get the following:

# packages in environment at /home/jill/mambaforge/envs/atlas:
#
# Name                    Version                   Build  Channel
python                    3.8.16          he550d4f_1_cpython    conda-forge

I am not sure what to do or what went wrong exactly.

SilasK commented 1 year ago

This is a critical issue. Could you try:


conda activate <conda env mentioned in first line>
mamba install python=3.8
comda deactivate 

Try rerun atlas.

jilldv commented 1 year ago

I just tried the commands above and started atlas again but unfortunately it gives the same error message.

SilasK commented 1 year ago

Can you give me the content of the file : ' ../atlas_databases/condaenvs/f077e509b9873bc1010104c7314c8556.yaml'

We need to raise an error on snakemake repo. Maybe I have tome tomorrow..

SilasK commented 1 year ago

And just to be shure can you run:

conda list python -p ../atlas_databases/conda_envs/f077e509b9873bc1010104c7314c8556_

`

SilasK commented 1 year ago

I raised the issue at the snakemake repo.

Maybe what you can do is to install the snakemake-wrapper-utils in the atlas env. This should be a hack.

Any others experiencing the problem?

jilldv commented 1 year ago

Thank you for the fast response. The content from the file ' ../atlas_databases/condaenvs/f077e509b9873bc1010104c7314c8556.yaml' channels:

And the output from conda list python -p ../atlas_databases/conda_envs/f077e509b9873bc1010104c7314c8556_

# packages in environment at /home/jill/atlas_databases/conda_envs/f077e509b9873bc1010104c7314c8556_:
#
# Name                    Version                   Build  Channel
python                    3.8.16          he550d4f_1_cpython    conda-forge
jilldv commented 1 year ago

Maybe what you can do is to install the snakemake-wrapper-utils in the atlas env. This should be a hack.

I install this also with conda or not?

SilasK commented 1 year ago

Everything looks as it should be. Yes, you can try to install snakemake-wrapper-utils with conda/mamba in the atlas env.

Can you tell me and on https://github.com/snakemake/snakemake/issues/2222 which version of snakemake you are using.

Maybe updating it to 7.25 could help.

jilldv commented 1 year ago

The snakemake version I am using is 7.25.0

After installing snakemake-wrapper-utils the pipeline continued. However now I get an error running DAStool. This is the error message:

rule run_das_tool:
    input: sample2/binning/DASTool/metabat.scaffolds2bin, sample2/binning/DASTool/maxbin.scaffolds2bin, sample2/sample2_contigs.fasta, sample2/annotation/predicted_genes/sample2.faa
    output: sample2/binning/DASTool/sample2_DASTool_summary.tsv, sample2/binning/DASTool/sample2_allBins.eval, sample2/binning/DASTool/cluster_attribution.tsv
    log: sample2/logs/binning/DASTool.log
    jobid: 30
    reason: Missing output files: sample2/binning/DASTool/cluster_attribution.tsv
    wildcards: sample=sample2
    threads: 2
    resources: tmpdir=/tmp, mem=10, time_min=300, mem_mb=60000, mem_mib=57221, runtime=18000

Activating conda environment: ../atlas_databases/conda_envs/edb12c7f1add802bb49be03c8f52f012_
[Tue Apr 18 16:21:04 2023]
Error in rule run_das_tool:
    jobid: 30
    input: sample2/binning/DASTool/metabat.scaffolds2bin, sample2/binning/DASTool/maxbin.scaffolds2bin, sample2/sample2_contigs.fasta, sample2/annotation/predicted_genes/sample2.faa
    output: sample2/binning/DASTool/sample2_DASTool_summary.tsv, sample2/binning/DASTool/sample2_allBins.eval, sample2/binning/DASTool/cluster_attribution.tsv
    log: sample2/logs/binning/DASTool.log (check log file(s) for error details)
    conda-env: /home/jill/atlas_databases/conda_envs/edb12c7f1add802bb49be03c8f52f012_
    shell:
         DAS_Tool --outputbasename sample2/binning/DASTool/sample2  --bins sample2/binning/DASTool/metabat.scaffolds2bin,sample2/binning/DASTool/maxbin.scaffolds2bin  --labels metabat,maxbin  --contigs sample2/sample2_contigs.fasta  --search_engine diamond  --proteins sample2/annotation/predicted_genes/sample2.faa  --write_bin_evals  --megabin_penalty 0.5 --duplicate_penalty 0.6  --threads 2  --debug  --score_threshold 0.5 &> sample2/logs/binning/DASTool.log  ; mv sample2/binning/DASTool/sample2_DASTool_contig2bin.tsv sample2/binning/DASTool/cluster_attribution.tsv &>> sample2/logs/binning/DASTool.log
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

The content from the log file 'sample2/logs/binning/DASTool.log': Error in library(docopt, warn.conflicts = F, quietly = T) : there is no package called ‘docopt’ Calls: suppressMessages -> withCallingHandlers Execution halted

I tried installing docopt in the ../atlas_databases/condaenvs/edb12c7f1add802bb49be03c8f52f012 environment but I keep getting the same error message.

SilasK commented 1 year ago

So there is an error in the r-package docopt ?

could you send me the output of the conda list docopt -p ...path to conda env

What about using another binner as final_binner in the config file. e.g. vamb if you have less than 100 samples.

SilasK commented 1 year ago

Thank you for reporting

jilldv commented 1 year ago

This is the output

(atlas) jill@WE11sv03:~/atlas_test$ conda list docopt -p ../atlas_databases/conda_envs/edb12c7f1add802bb49be03c8f52f012_
# packages in environment at /home/jill/atlas_databases/conda_envs/edb12c7f1add802bb49be03c8f52f012_:
#
# Name                    Version                   Build  Channel
docopt                    0.6.2                      py_1    conda-forge
r-docopt                  0.7.1             r42hc72bb7e_2    conda-forge

I can use another binner I think. Do these other binners also do bin refinement or is this not really necessary for running the genome workflow?

Sorry for all the questions and error messages.

SilasK commented 1 year ago

Other binners don't use bin refinement but finally vamb gives better results.

It seems to be ab error with this r package.

We had someti es the problem with other R version in the path that took precedence ocer the R version of the conda env from snakemake.

Could you please activate the conda env of the DAStool and print out the $PATH varable.

jilldv commented 1 year ago

This is the $PATH I get:

/home/jill/atlas_databases/condaenvs/edb12c7f1add802bb49be03c8f52f012/share/rubygems/bin:/home/jill/bin:/home/jill/.local/bin:/usr/bin:/home/bioadmin/Downloads/FastQC-0.11.9/fastqc:/home/jill/bin:/home/jill/.local/bin:/usr/bin:/home/bioadmin/Downloads/FastQC-0.11.9/fastqc:/home/jill/atlas_databases/condaenvs/edb12c7f1add802bb49be03c8f52f012/bin:/home/bioadmin/miniconda3/condabin:/usr/local/ncbi/sra-tools/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/bin/usearch:$HOME/.local/bin:~/.cache:/usr/local/bin/spades/SPAdes-3.11.1-Linux/bin:/snap/bin:/home/jill/bin:/usr/lib/jvm/java-8-openjdk-amd64/bin/java/bin:/home/bioadmin/Downloads/squeezeMeta/SqueezeMeta-1.0.0/scripts:/home/mbogorad/RNAQCchain/RNA-QC-Chain/bin:/home/jill/bin:/usr/lib/jvm/java-8-openjdk-amd64/bin/java/bin:/home/bioadmin/Downloads/squeezeMeta/SqueezeMeta-1.0.0/scripts:/home/mbogorad/RNAQCchain/RNA-QC-Chain/bin

SilasK commented 1 year ago

It is possible that there is another R version in your path.

Could you try:

conda activate /home/jill/atlas_databases/conda_envs/edb12c7f1add802bb49be03c8f52f012_
which R

and if it is not the R from the conda env. I suggest you to simplyfy the path.

Maybe you need to do this itteratively.

jilldv commented 1 year ago

Sorry for my late response, some work came in between so I could only now come back to this. Indeed the wrong R version was used. I adjusted the path so the correct version was used and I was finally able to finish the pipeline. Thank you for all the help!