metagenome-atlas / atlas

ATLAS - Three commands to start analyzing your metagenome data
https://metagenome-atlas.github.io/
BSD 3-Clause "New" or "Revised" License
364 stars 97 forks source link

Error during DRAM download; processing vogdb #718

Closed abarilo closed 1 month ago

abarilo commented 4 months ago

Hi Silas,

I encountered an error during the DRAM download and solved by modifying one of the scripts. Maybe this can be usefull for other users. During the DRAM database download I get the folowing error message that appears both while running ATLAs and DRAM-setup.py:

Here is the relevant log output:

Processing vogdb
2024-04-22 16:21:59,195 - The subcommand ['hmmpress', '-f', '/scratch/abarilo1/atlas_assembly/databases/DRAM/db/vog_latest_hmms.txt'] experienced an error:
Error: File format problem in trying to open HMM file /scratch/abarilo1/atlas_assembly/databases/DRAM/db/vog_latest_hmms.txt.
File exists, but appears to be empty?

Traceback:

Traceback (most recent call last):
  File "/scratch/abarilo1/atlas_assembly/databases/conda_envs/3718baf3b150ea3bce54db7159e0bc00_/bin/DRAM-setup.py", line 184, in <module>
    args.func(**args_dict)
  File "/scratch/abarilo1/atlas_assembly/databases/conda_envs/3718baf3b150ea3bce54db7159e0bc00_/lib/python3.11/site-packages/mag_annotator/database_processing.py", line 555, in prepare_databases
    processed_locs = process_functions[i](locs[i], output_dir, LOGGER,
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/abarilo1/atlas_assembly/databases/conda_envs/3718baf3b150ea3bce54db7159e0bc00_/lib/python3.11/site-packages/mag_annotator/database_processing.py", line 317, in process_vogdb
    run_process(['hmmpress', '-f', vog_hmms], logger, verbose=verbose)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/abarilo1/atlas_assembly/databases/conda_envs/3718baf3b150ea3bce54db7159e0bc00_/lib/python3.11/site-packages/mag_annotator/utils.py", line 71, in run_process
    raise subprocess.SubprocessError(f"The subcommand {' '.join(command)} experienced an error, see the log for more info.")
subprocess.SubprocessError: The subcommand hmmpress -f /scratch/abarilo1/atlas_assembly/databases/DRAM/db/vog_latest_hmms.txt experienced an error, see the log for more info.

I fixed it by modifying database_processing.py script, line 316 in process_vogdb

Old script:

merge_files(glob(path.join(hmm_dir, 'VOG*.hmm')), vog_hmms)

New script:

merge_files(glob(path.join(hmm_dir, 'hmm', 'VOG*.hmm')), vog_hmms)

After this fix the download went smoothly.

atlas, version 2.18.1+12.gdc53d3b4

Cheers, Anastasiia

SilasK commented 3 months ago

Thank you for the hint.

boulund commented 3 months ago

Thanks for this, I also ran into this last week! It took some time to find the database_processing.py script in the generated conda env containing mag_annotator, for me it was located in the conda env folder in the atlas db folder:

atlas_db/conda_envs/220fa41099c6afee6550f2ccb565fe10_/lib/python3.11/site-packages/mag_annotator/database_processing.py

The line number you mentioned @abarilo was correct, but the function was called process_vogdb. I made the change you suggested and just started a new run, I'll report back if it works as intended now.

abarilo commented 3 months ago

Hope it works now @boulund. Corrected the function name, thanks!

anbadilla commented 3 months ago

Folks, can Atlas be linked to a currently existing DRAM db in my environment?

SilasK commented 3 months ago

Folks, can Atlas be linked to a currently existing DRAM db in my environment?

Yes.

  1. You can export the dram config file from the installed db. DRAM-setup.py export_config --output_file
  2. Set in the atlas config file the path to dram_config_file: output_file

https://github.com/metagenome-atlas/atlas/blob/master/workflow/rules/dram.smk

ShyGuy509 commented 3 months ago

Thanks for the fix! I'm new to this, and I've been trying for a couple days now to figure out why it's not working.

I'm also having the following issue: /data/users/danross/databases/condaenvs/e6506989c58f442efbed441478497d1c/lib/python3.11/site-packages/mag_annotator/database_handler.py:123: UserWarning: Database does not exist at path None warnings.warn("Database does not exist at path %s" % description_loc)

I think it's related to Issue 664, but that was for a previous version of Atlas

Thanks :)