metagenome-atlas / atlas

ATLAS - Three commands to start analyzing your metagenome data
https://metagenome-atlas.github.io/
BSD 3-Clause "New" or "Revised" License
364 stars 97 forks source link

Error in rule multiqc_mapping_genome due to incontability with python 3.12 #702

Closed valery-shap closed 6 months ago

valery-shap commented 9 months ago

Here is the relevant log output:

Error in rule multiqc_mapping_genome:
    jobid: 68
    input: genomes/alignments/stats/SRR14092348.stats
    output: reports/genome_mapping/results.html
    log: logs/genomes/alignment/multiqc.log (check log file(s) for error details)
    conda-env: /home/vshapovalova/shortbred-0.9.4/1042A_atlas_results_slurm2/databases/conda_envs/639a89b5048e3b7db300283358480e8a_
Logfile logs/genomes/alignment/multiqc.log:
================================================================================
Traceback (most recent call last):
  File "/home/vshapovalova/shortbred-0.9.4/1042A_atlas_results_slurm2/databases/conda_envs/639a89b5048e3b7db300283358480e8a_/bin/multiqc", line 8, in <module>
    from multiqc.__main__ import run_multiqc
  File "/home/vshapovalova/shortbred-0.9.4/1042A_atlas_results_slurm2/databases/conda_envs/639a89b5048e3b7db300283358480e8a_/lib/python3.12/site-packages/multiqc/__init__.py", line 16, in <module>
    from .multiqc import run
  File "/home/vshapovalova/shortbred-0.9.4/1042A_atlas_results_slurm2/databases/conda_envs/639a89b5048e3b7db300283358480e8a_/lib/python3.12/site-packages/multiqc/multiqc.py", line 29, in <module>
    from .plots import table
  File "/home/vshapovalova/shortbred-0.9.4/1042A_atlas_results_slurm2/databases/conda_envs/639a89b5048e3b7db300283358480e8a_/lib/python3.12/site-packages/multiqc/plots/table.py", line 9, in <module>
    from multiqc.utils import config, report, util_functions, mqc_colour
  File "/home/vshapovalova/shortbred-0.9.4/1042A_atlas_results_slurm2/databases/conda_envs/639a89b5048e3b7db300283358480e8a_/lib/python3.12/site-packages/multiqc/utils/report.py", line 13, in <module>
    import lzstring
  File "/home/vshapovalova/shortbred-0.9.4/1042A_atlas_results_slurm2/databases/conda_envs/639a89b5048e3b7db300283358480e8a_/lib/python3.12/site-packages/lzstring/__init__.py", line 11, in <module>
    from future import standard_library
  File "/home/vshapovalova/shortbred-0.9.4/1042A_atlas_results_slurm2/databases/conda_envs/639a89b5048e3b7db300283358480e8a_/lib/python3.12/site-packages/future/standard_library/__init__.py", line 65, in <module>
    import imp
ModuleNotFoundError: No module named 'imp'
================================================================================

[Sun Oct 29 13:53:42 2023]
Error in rule dram_download:
    jobid: 80
    output: /home/vshapovalova/shortbred-0.9.4/1042A_atlas_results_slurm2/databases/DRAM/db, /home/vshapovalova/shortbred-0.9.4/1042A_atlas_results_slurm2/databases/DRAM/DRAM.config
    log: logs/dram/download_dram.log (check log file(s) for error details)
    conda-env: /home/vshapovalova/shortbred-0.9.4/1042A_atlas_results_slurm2/databases/conda_envs/46463df2ef6b73fb5fdc32012f54c7bb_
    shell:
         DRAM-setup.py prepare_databases  --output_dir /home/vshapovalova/shortbred-0.9.4/1042A_atlas_results_slurm2/databases/DRAM/db  --threads 8  --verbose  --skip_uniref  &> logs/dram/download_dram.log  ;  DRAM-setup.py export_config --output_file /home/vshapovalova/shortbred-0.9.4/1042A_atlas_results_slurm2/databases/DRAM/DRAM.config
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Logfile logs/dram/download_dram.log:
================================================================================
/home/vshapovalova/shortbred-0.9.4/1042A_atlas_results_slurm2/databases/conda_envs/46463df2ef6b73fb5fdc32012f54c7bb_/lib/python3.11/site-packages/mag_annotator/database_handler.py:123: UserWarning: Database does not exist at path None
  warnings.warn("Database does not exist at path %s" % description_loc)
2023-10-29 11:22:39,679 - Starting the process of downloading data
2023-10-29 11:22:39,679 - Skipping UniRef
2023-10-29 11:22:39,679 - The kegg_loc argument was not used to specify a downloaded kegg file, and dram can not download it its self. So it is assumed that the user wants to set up DRAM without it
2023-10-29 11:22:39,679 - The gene_ko_link_loc argument was not used to specify a downloaded gene_ko_link file, and dram can not download it its self. So it is assumed that the user wants to set up DRAM without it
2023-10-29 11:22:39,679 - Database preparation started
2023-10-29 11:22:39,679 - Downloading kofam_hmm
2023-10-29 12:04:58,500 - Downloading kofam_ko_list
2023-10-29 12:05:35,569 - Downloading pfam
2023-10-29 13:51:31,497 - Downloading pfam_hmm
2023-10-29 13:51:32,430 - Downloading dbcan
2023-10-29 13:53:41,806 - Something went wrong with the download of the url: http://bcb.unl.edu/dbCAN2/download/dbCAN-HMMdb-V11.txt
2023-10-29 13:53:41,806 - <urlopen error [Errno 110] Connection timed out>
downloading ftp://ftp.genome.jp/pub/db/kofam/profiles.tar.gz
downloading ftp://ftp.genome.jp/pub/db/kofam/ko_list.gz
downloading ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.full.gz
downloading ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.dat.gz
downloading http://bcb.unl.edu/dbCAN2/download/dbCAN-HMMdb-V11.txt
Traceback (most recent call last):
  File "/home/vshapovalova/shortbred-0.9.4/1042A_atlas_results_slurm2/databases/conda_envs/46463df2ef6b73fb5fdc32012f54c7bb_/bin/DRAM-setup.py", line 184, in <module>
    args.func(**args_dict)
  File "/home/vshapovalova/shortbred-0.9.4/1042A_atlas_results_slurm2/databases/conda_envs/46463df2ef6b73fb5fdc32012f54c7bb_/lib/python3.11/site-packages/mag_annotator/database_processing.py", line 532, in prepare_databases
    locs[i] = download_functions[i](
              ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vshapovalova/shortbred-0.9.4/1042A_atlas_results_slurm2/databases/conda_envs/46463df2ef6b73fb5fdc32012f54c7bb_/lib/python3.11/site-packages/mag_annotator/database_processing.py", line 109, in download_dbcan
    download_file(link_path, dbcan_hmm, logger, verbose=verbose)
  File "/home/vshapovalova/shortbred-0.9.4/1042A_atlas_results_slurm2/databases/conda_envs/46463df2ef6b73fb5fdc32012f54c7bb_/lib/python3.11/site-packages/mag_annotator/utils.py", line 33, in download_file
    raise URLError("DRAM whas not able to download a key database, check the logg for details")
urllib.error.URLError: <urlopen error DRAM whas not able to download a key database, check the logg for details>
================================================================================

Removing output files of failed job dram_download since they might be corrupted:
/home/vshapovalova/shortbred-0.9.4/1042A_atlas_results_slurm2/databases/DRAM/db
[Sun Oct 29 14:06:15 2023]
Finished job 109.
109 of 150 steps (73%) done
[Sun Oct 29 18:41:59 2023]
Finished job 74.
110 of 150 steps (73%) done
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-10-29T105800.412899.snakemake.log

Atlas version 2.18.1 Additional context I've checked #664 issue and 'conda config --set channel_priority strict ' was done but it didn't help. I attached the full log file.

A lot of thanks, 2023-10-29T105800.412899.snakemake.log

Valery

johanneswerner commented 9 months ago

I have the same problem for multiqc, might be related to https://github.com/ewels/MultiQC/issues/2112.

valery-shap commented 9 months ago

@johanneswerner Thank you! I will try, but it seems that the critical error is in the rule dram_download.

johanneswerner commented 9 months ago

@SilasK is there a quick fix or a working release I can downgrade to where this issue does not occur?

SilasK commented 9 months ago

For the multiqc error you can activate the conda env of the rule and install the missing package.

You can also downgrade to python 3.11 inside this env.

For the drem error. It seems that it cannot find a database shortbread. Let me check if you can deactivate this specific db.

For now you can continue atlas with --keep-going

valery-shap commented 9 months ago

@SilasK thank you very much for reply! the problem is with dbCAN "Something went wrong with the download of the url: http://bcb.unl.edu/dbCAN2/download/dbCAN-HMMdb-V11.txt" Could i download it offline and just put in the right folder? I have attached the log file of dram_download download_dram.log

Best regards, Valery

SilasK commented 9 months ago

I don't know why this happens. Here would be a manual solution.

I think for Dram you can also

  1. activate the conda environment and

  2. run the command directly.

    DRAM-setup.py prepare_databases  --output_dir /home/vshapovalova/shortbred-0.9.4/1042A_atlas_results_slurm2/databases/DRAM/db  --threads 8  --verbose  --skip_uniref  &> logs/dram/download_dram.log  
    

    Unfortunately, atlas deletes all files upon error

    • [ ] Note to me: deactivate this. You can rerun the command several times. and if it still has problems with the dbcan file. This command accepts an argument --dbcan_loc dbCAN-HMMdb-V11.txt
  3. Finally export the config file

 DRAM-setup.py export_config --output_file /home/vshapovalova/shortbred-0.9.4/1042A_atlas_results_slurm2/databases/DRAM/DRAM.config
  1. Continue with atlas, check with a dryrun if atlas accepts the config file and doesn't try to re-download the databse. e.g. if atlas doesn't plan to run dram_download again.
CharlotteJNeumann commented 9 months ago

Hi there,

I am also getting a multiqc error similar to the one above, also containing the imp error. (in short):

ATLAS_genomes.err rule multiqc_mapping_genome: input: genomes/alignments/stats/i02S3.stats, genomes/alignments/stats/i02SM01.stats, ... output: reports/genome_mapping/results.html log: logs/genomes/alignment/multiqc.log jobid: 2147 reason: Missing output files: reports/genome_mapping/results.html resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=, time_min=300, runtime=300

multiqc.log ... File "/home/conda/int_microbiome/databases/atlas/condaenvs/a0f70fea96d2e29d2cb26720d8bf26bd/lib/python3.12/site-packages/future/standard_library/init.py", line 65, in import imp ModuleNotFoundError: No module named 'imp'

In this github discussion Silas said: "for the multiqc error you can activate the conda env of the rule and install the missing package."

Could you please tell me specifically how and in which directory to activate and to install the package?

Big thanks in advance!

SilasK commented 9 months ago

Just to say that is't not my fault but the error comes of incompatibility upstream with python 12.

Solution in your case would be:

conda activate /home/conda/int_microbiome/databases/atlas/condaenvs/a0f70fea96d2e29d2cb26720d8bf26bd
conda install -y imp
conda deactivate
CharlotteJNeumann commented 9 months ago

Sorry Silas, I didn't want to offend you at all!! I will try it, thank you so much and also for you fast response!

SilasK commented 9 months ago

No problem..

github-actions[bot] commented 7 months ago

There was no activity since some time. I hope your issue is solved in the mean time. This issue will automatically close soon if no further activity occurs.

Thank you for your contributions.

njohner commented 6 months ago

that issue was not solved. It's due to an incompatibility of MultiQC with Python3.12, which was fixed here. See https://github.com/metagenome-atlas/atlas/pull/714

njohner commented 6 months ago

For those needing a quick fix (the proposed solution above did not work for me), you can simply update multiqc in the corresponding conda environment:

conda activate path/to/condaenv
conda update multiqc
conda deactivate