metagenome-atlas / atlas

ATLAS - Three commands to start analyzing your metagenome data
https://metagenome-atlas.github.io/
BSD 3-Clause "New" or "Revised" License
364 stars 97 forks source link

atlas downdload fail #700

Closed jowodo closed 9 months ago

jowodo commented 10 months ago
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/scratch/mirror/atlas/2.18.1/conda_envs/970bf81dc2fb89a4cccba899825a84ec_/lib/python3.8/socket.py", line 669, in readinto
    return self._sock.recv_into(b)
  File "/scratch/mirror/atlas/2.18.1/conda_envs/970bf81dc2fb89a4cccba899825a84ec_/lib/python3.8/ssl.py", line 1274, in recv_into
  File "/scratch/mirror/atlas/2.18.1/conda_envs/970bf81dc2fb89a4cccba899825a84ec_/lib/python3.8/site-packages/urllib3/util/retry.py", line 470, in increment
    raise reraise(type(error), error, _stacktrace)
  File "/scratch/mirror/atlas/2.18.1/conda_envs/970bf81dc2fb89a4cccba899825a84ec_/lib/python3.8/site-packages/urllib3/util/util.py", line 39, in reraise
    raise value
  File "/scratch/mirror/atlas/2.18.1/conda_envs/970bf81dc2fb89a4cccba899825a84ec_/lib/python3.8/site-packages/urllib3/connectionpool.py", line 790, in urlopen
    response = self._make_request(
  File "/scratch/mirror/atlas/2.18.1/conda_envs/970bf81dc2fb89a4cccba899825a84ec_/lib/python3.8/site-packages/urllib3/connectionpool.py", line 538, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/scratch/mirror/atlas/2.18.1/conda_envs/970bf81dc2fb89a4cccba899825a84ec_/lib/python3.8/site-packages/urllib3/connectionpool.py", line 370, in _raise_timeout
    raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='zenodo.org', port=443): Read timed out. (read timeout=15.0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/scratch/mirror/atlas/2.18.1/conda_envs/970bf81dc2fb89a4cccba899825a84ec_/lib/python3.8/site-packages/checkm2/zenodo_backpack.py", line 164, in _retrieve_record_ID
    r = requests.get(DOI, timeout=15.)
  File "/scratch/mirror/atlas/2.18.1/conda_envs/970bf81dc2fb89a4cccba899825a84ec_/lib/python3.8/site-packages/requests/api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
  File "/scratch/mirror/atlas/2.18.1/conda_envs/970bf81dc2fb89a4cccba899825a84ec_/lib/python3.8/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/scratch/mirror/atlas/2.18.1/conda_envs/970bf81dc2fb89a4cccba899825a84ec_/lib/python3.8/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/scratch/mirror/atlas/2.18.1/conda_envs/970bf81dc2fb89a4cccba899825a84ec_/lib/python3.8/site-packages/requests/sessions.py", line 725, in send    history = [resp for resp in gen]
  File "/scratch/mirror/atlas/2.18.1/conda_envs/970bf81dc2fb89a4cccba899825a84ec_/lib/python3.8/site-packages/requests/sessions.py", line 725, in <listcomp>
    history = [resp for resp in gen]
  File "/scratch/mirror/atlas/2.18.1/conda_envs/970bf81dc2fb89a4cccba899825a84ec_/lib/python3.8/site-packages/requests/sessions.py", line 266, in resolve_redirects
    resp = self.send(
  File "/scratch/mirror/atlas/2.18.1/conda_envs/970bf81dc2fb89a4cccba899825a84ec_/lib/python3.8/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/scratch/mirror/atlas/2.18.1/conda_envs/970bf81dc2fb89a4cccba899825a84ec_/lib/python3.8/site-packages/requests/adapters.py", line 532, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='zenodo.org', port=443): Read timed out. (read timeout=15.0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/scratch/mirror/atlas/2.18.1/conda_envs/970bf81dc2fb89a4cccba899825a84ec_/bin/checkm2", line 280, in <module>
    fileManager.DiamondDB().download_database(args.path)
  File "/scratch/mirror/atlas/2.18.1/conda_envs/970bf81dc2fb89a4cccba899825a84ec_/lib/python3.8/site-packages/checkm2/fileManager.py", line 127, in download_database
    backpack_downloader.download_and_extract(download_location, DOI, progress_bar=True, no_check_version=False)
  File "/scratch/mirror/atlas/2.18.1/conda_envs/970bf81dc2fb89a4cccba899825a84ec_/lib/python3.8/site-packages/checkm2/zenodo_backpack.py", line 46, in download_and_extract
    recordID = self._retrieve_record_ID(DOI)
  File "/scratch/mirror/atlas/2.18.1/conda_envs/970bf81dc2fb89a4cccba899825a84ec_/lib/python3.8/site-packages/checkm2/zenodo_backpack.py", line 166, in _retrieve_record_ID
    raise ZenodoConnectionException('Connection error: {}'.format(e))
checkm2.zenodo_backpack.ZenodoConnectionException: Connection error: HTTPSConnectionPool(host='zenodo.org', port=443): Read timed out. (read timeout=15.0)
================================================================================

Removing output files of failed job checkm2_download_db since they might be corrupted:
/scratch/mirror/atlas/2.18.1/CheckM2
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

Atlas version 2.18.1

SilasK commented 10 months ago

Seems like a problem of Internet connection. Did other download steps succede?

jowodo commented 10 months ago

I tried again and got this error:

$ atlas download --db-dir .
Building DAG of jobs...
Your conda installation is not configured to use strict channel priorities. This is however crucial for having robust and correct environments (for details, see https://conda-for
ge.org/docs/user/tipsandtricks.html). Please consider to configure strict priorities by executing 'conda config --set channel_priority strict'.
Creating conda environment /home/apps/conda/miniconda3/envs/atlas-2.18.1/lib/python3.10/site-packages/atlas/workflow/rules/../envs/eggNOG.yaml...
Downloading and installing remote packages.
Environment for /home/apps/conda/miniconda3/envs/atlas-2.18.1/lib/python3.10/site-packages/atlas/workflow/rules/../envs/eggNOG.yaml created (location: conda_envs/ce47687b109879e3
9a03638f72c50b1e_)
Creating conda environment /home/apps/conda/miniconda3/envs/atlas-2.18.1/lib/python3.10/site-packages/atlas/workflow/rules/../envs/checkm2.yaml...
Downloading and installing remote packages.
Environment for /home/apps/conda/miniconda3/envs/atlas-2.18.1/lib/python3.10/site-packages/atlas/workflow/rules/../envs/checkm2.yaml created (location: conda_envs/e52a9b934605d80
74e0163627fbe0316_)
Creating conda environment /home/apps/conda/miniconda3/envs/atlas-2.18.1/lib/python3.10/site-packages/atlas/workflow/rules/../envs/gtdbtk.yaml...
Downloading and installing remote packages.
Environment for /home/apps/conda/miniconda3/envs/atlas-2.18.1/lib/python3.10/site-packages/atlas/workflow/rules/../envs/gtdbtk.yaml created (location: conda_envs/ab1b2b5668e67301
9caaf32843c034c4_)
Using shell: /bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job                      count    min threads    max threads
---------------------  -------  -------------  -------------
checkm2_download_db          1              1              1
download                     1              1              1
download_atlas_files         2              1              1
download_eggNOG_files        1              1              1
download_gtdb                1              1              1
extract_gtdb                 1              1              1
total                        7              1              1

Select jobs to execute...

[Wed Oct 18 15:54:41 2023]
localrule download_atlas_files:
    output: /scratch/students/apptest/adapters.fa
    jobid: 1
    reason: Missing output files: /scratch/students/apptest/adapters.fa
    wildcards: filename=adapters.fa
    resources: tmpdir=/tmp

Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Select jobs to execute...
--2023-10-18 15:54:42--  https://zenodo.org/record/1134890/files/adapters.fa
Resolving zenodo.org (zenodo.org)... 188.185.10.78, 188.185.22.33, 188.185.33.206, ...
Connecting to zenodo.org (zenodo.org)|188.185.10.78|:443... connected.
HTTP request sent, awaiting response... 301 MOVED PERMANENTLY
Location: /records/1134890/files/adapters.fa [following]
--2023-10-18 15:54:42--  https://zenodo.org/records/1134890/files/adapters.fa
Reusing existing connection to zenodo.org:443.
HTTP request sent, awaiting response... 200 OK
Length: 13954 (14K) [application/octet-stream]
Saving to: '/scratch/students/apptest/adapters.fa'

/scratch/students/apptest/adapters.fa   100%[============================================================================================>]  13.63K  --.-KB/s    in 0.001s

2023-10-18 15:54:42 (20.6 MB/s) - '/scratch/students/apptest/adapters.fa' saved [13954/13954]

[Wed Oct 18 15:54:43 2023]
Finished job 1.
1 of 7 steps (14%) done
Select jobs to execute...

[Wed Oct 18 15:54:43 2023]
rule checkm2_download_db:
    output: /scratch/students/apptest/CheckM2
    log: logs/download/checkm2.log
    jobid: 4
    reason: Missing output files: /scratch/students/apptest/CheckM2
    resources: tmpdir=/tmp, time=10

Activating conda environment: conda_envs/e52a9b934605d8074e0163627fbe0316_
[Wed Oct 18 15:55:13 2023]
Error in rule checkm2_download_db:
    jobid: 4
    output: /scratch/students/apptest/CheckM2
    log: logs/download/checkm2.log (check log file(s) for error details)
    conda-env: /scratch/students/apptest/conda_envs/e52a9b934605d8074e0163627fbe0316_
    shell:
         checkm2 database --download --path /scratch/students/apptest/CheckM2  &>> logs/download/checkm2.log
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Logfile logs/download/checkm2.log:
================================================================================
[10/18/2023 03:55:12 PM] INFO: Command: Download database. Checking internal path information.
Traceback (most recent call last):
  File "/scratch/students/apptest/conda_envs/e52a9b934605d8074e0163627fbe0316_/bin/checkm2", line 280, in <module>
    fileManager.DiamondDB().download_database(args.path)
  File "/scratch/students/apptest/conda_envs/e52a9b934605d8074e0163627fbe0316_/lib/python3.8/site-packages/checkm2/fileManager.py", line 127, in download_database
    backpack_downloader.download_and_extract(download_location, DOI, progress_bar=True, no_check_version=False)
  File "/scratch/students/apptest/conda_envs/e52a9b934605d8074e0163627fbe0316_/lib/python3.8/site-packages/checkm2/zenodo_backpack.py", line 52, in download_and_extract
    fname = str(file['key']).split('/')[-1]
KeyError: 'key'
================================================================================

Removing output files of failed job checkm2_download_db since they might be corrupted:
/scratch/students/apptest/CheckM2
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
An error occurred while downloading reference databases.
ATLAS databases can be manually downloaded from: https://zenodo.org/record/1134890
eggNOG databases can be manually downloaded from: http://eggnogdb.embl.de/download/emapperdb-5
CAT databases can be manually downloaded from: https://github.com/dutilh/CAT
Complete log: .snakemake/log/2023-10-18T154707.281348.snakemake.log
[Atlas] CRITICAL: Command 'snakemake --snakefile /home/apps/conda/miniconda3/envs/atlas-2.18.1/lib/python3.10/site-packages/atlas/workflow/rules/download.smk --jobs 1 --rerun-incomplete --conda-frontend mamba --scheduler greedy --nolock  --use-conda  --conda-prefix /scratch/students/apptest/conda_envs  --show-failed-logs --config database_dir='/scratch/students/apptest' -- ' returned non-zero exit status 1.

The link printed for manual eggNOG database download 404's. http://eggnog6.embl.de/download/emapperdb-5

SilasK commented 10 months ago

I will try next week to debug this. It seems you could install conda envs without problem.

Even if you don't have all the databases you can go ahead and run atlas.

jowodo commented 10 months ago

yes, conda installation works without problem! It seems that the directory structure for eggNOG is more fine grained than just major version http://eggnog6.embl.de/download/emapperdb-5.0.2/ instead of http://eggnog6.embl.de/download/emapperdb-5

SilasK commented 9 months ago

Where are we here again. you had an error in checkm2 isn't it or egggNog? Do you need the eggNOG really?

jowodo commented 9 months ago

Hi, I retried the command:

atlas download --db-dir $DBDIR

and it worked now. May have been related to connection problems after all.