metagenome-atlas / atlas

ATLAS - Three commands to start analyzing your metagenome data
https://metagenome-atlas.github.io/
BSD 3-Clause "New" or "Revised" License
368 stars 97 forks source link

checmk2 ERROR: DIAMOND database not found #657

Closed JustinGibbons closed 1 year ago

JustinGibbons commented 1 year ago

Here is the relevant log output:


Error in rule run_checkm2:
    jobid: 4015
    input: TH199122016/binning/DASTool/bins, /work/j/jgibbons1/Tina_Sample_Metagenome_Atlas/CheckM2
    output: TH199122016/binning/DASTool/bin_quality/checkm2/quality_report.tsv
    log: TH199122016/logs/binning/DASTool/checkm2.log, TH199122016/binning/DASTool/bin_quality/checkm2/checkm2.log (check log file(s) for error message)
    conda-env: /work/j/jgibbons1/Tina_Sample_Metagenome_Atlas/conda_envs/c5f2f8c426b3efe91c8be14f2a13c9c0_
    shell:
         checkm2 predict  --threads 4    --force  --allmodels  -x .fasta  --tmpdir <TBD>  --input TH199122016/binning/DASTool/bins  --output-directory TH199122016/binning/DASTool/bin_quality/checkm2  &> TH199122016/logs/binning/DASTool/checkm2.log 
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    cluster_jobid: 14776370

Error executing rule run_checkm2 on cluster (jobid: 4015, external: 14776370, jobscript: /shares/omicshub/clients/Tina_Ho/Metagenomics_2021-2022/Metagenome_Atlas_Assembly/.snakemake/tmp.vqr1rlay/snakejob.run_checkm2.4015.sh). For error details see the cluster log and the log files of the involved rule(s).
Select jobs to execute...

The checkm2.log file says:

05/18/2023 11:08:18 AM] INFO: Running CheckM2 version 1.0.1
[05/18/2023 11:08:18 AM] INFO: Running quality prediction workflow with 4 threads.
[05/18/2023 11:08:18 AM] ERROR: DIAMOND database not found. Please download database using <checkm2 database --download>

Atlas version atlas, version 2.14.2 Snakemake 7.18.2 Additional context This seems like a straightforward error except the database does exist. I have uniref100.KO.1.dmnd downloaded and I previously did a successful test run using the downloaded databases. Also eggNOG is successfully finding it's databases.

According to the checkm2 documentation you can use an environmental variable to tell checkm2 where the databases are: https://github.com/chklovski/CheckM2

I can test this out and report back, but I thought you'd like to know about this error.

Do you know any other ways I can get atlas to find the checkm2 database?

Thank you

JustinGibbons commented 1 year ago

Creating the environmental variable CHECKM2DB as describe in the checkm2 documentation does solve the problem, but I still don't know why atlas wasn't able to find the database in the first place

JustinGibbons commented 1 year ago

I ran into the same issue with the identify rule. I solved it by creating the GTDBTK_DATA_PATH similar to what is described here: https://ecogenomics.github.io/GTDBTk/installing/bioconda.html

Instead of adding it to the conda env I added export GTDBTK_DATA_PATH to the bash script I use to submit the atlas job

SilasK commented 1 year ago

I will check this.

I saw that there just came out a new small update to checkm2 db.

Could you tell me in which rule you had the gtdb error.

I also saw that you use not the latest atlas version. I think at least the gtdb errer should ve fixed in it.

JustinGibbons commented 1 year ago

Hi Silas,

The gtdb error was in rule identify.

Does atlas update the conda environment everytime there is an update to a package?

SilasK commented 1 year ago

I follow the recommended environment definitions. I specify the tools' major and minor versions. This lets some room for bug fixes.

e.g. I defined - checkm2>=1.0.1, <1.1

So you could have 1.0.1 or 1.0.2 installed.

SilasK commented 1 year ago

Could you please use


conda activate  /work/j/jgibbons1/Tina_Sample_Metagenome_Atlas/conda_envs/c5f2f8c426b3efe91c8be14f2a13c9c0_
conda list checkm

and tell me the version of checkm2 you are using.

If it is 1.0.1

you might run


rm -r /work/j/jgibbons1/Tina_Sample_Metagenome_Atlas/conda_envs/c5f2f8c426b3efe91c8be14f2a13c9c0_

and then run atlas.

JustinGibbons commented 1 year ago

Hi Silas,

This is the checkm version information

Name                    Version                   Build  Channel
checkm2                   1.0.1              pyh7cba7a3_0    bioconda

The job I just ran was too big to do again now that it's finished, but I'll try your suggestion if I run into problems again.

Thank you