metagenome-atlas / atlas

ATLAS - Three commands to start analyzing your metagenome data
https://metagenome-atlas.github.io/
BSD 3-Clause "New" or "Revised" License
364 stars 97 forks source link

download wrapper on head node #675

Closed jotech closed 11 months ago

jotech commented 1 year ago
Submitted job 197 with external jobid '1556263'.                                                                                                                             
[Wed Jun 28 14:49:57 2023]
Error in rule align_reads_to_final_contigs:
    jobid: 136
    input: S221200000964/sequence_quality_control/S221200000964_QC_R1.fastq.gz, S221200000964/sequence_quality_control/S221200000964_QC_R2.fastq.gz, S221200000964/S221200000964_contigs.fasta
    output: S221200000964/sequence_alignment/S221200000964.bam
    log: S221200000964/logs/assembly/calculate_coverage/align_reads_from_S221200000964_to_filtered_contigs.log (check log file(s) for error details)
    conda-env: [...]/dat/db/atlas/conda_envs/858cb87c549bbec6b17014dc0e48e529_
    cluster_jobid: 1556261

Error executing rule align_reads_to_final_contigs on cluster (jobid: 136, external: 1556261, jobscript: [...]/metagenomes/atlas/.snakemake/tmp.4pstqkea/snakejob.align_reads_to_final_contigs.136.sh). For error details see the cluster log and the log files of the involved rule(s).

Unfortunately the corresponding log file is empty -rw-r--r-- 1 user group 0 Jun 28 14:49 S221200000964/logs/assembly/calculate_coverage/align_reads_from_S221200000964_to_filtered_contigs.log

The job is running only very shortly

sacct -j 1556261
               JobID    JobName      User  Partition        NodeList    Elapsed      State ExitCode     MaxRSS                        AllocTRES 
-------------------- ---------- --------- ---------- --------------- ---------- ---------- -------- ---------- -------------------------------- 
             1556261 align_rea+  user       base            n100   00:00:11     FAILED      1:0            billing=22,cpu=8,mem=60000M,nod+ 
       1556261.batch      batch                                 n100   00:00:11     FAILED      1:0          0          cpu=8,mem=60000M,node=1 
      1556261.extern     extern                                 n100   00:00:11  COMPLETED      0:0        56K billing=22,cpu=8,mem=60000M,nod+ 
SilasK commented 1 year ago

Check the length of the contigs: S221200000964/S221200000964_contigs.fasta Check the assembly and QC reports for this file...

the rule uses an official snakemake wrapper.

jotech commented 1 year ago

Thank you for all your help!

QC and assembly reports look okay to me. the lengths of contigs in the fasta are

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1000    1251    1726    3033    2995  136800 

I have problems identifying the precise command that failed. Is it possible to execute it manually in the conda environment to see why it failed?

SilasK commented 1 year ago

Do you have a cluster log file, there should be an error.

If the jobid is 1556261 you should find a cluster_log file with this number? Are you using my cluster wrapper?

The command is at the end of this page: https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/minimap2/aligner.html?highlight=minimap2%2Faligner

In theory you can activate the conda env and run something like the command.

jotech commented 1 year ago

Sorry for not getting back to you sooner; I got sick. Thank you for the advice. Here is the cluster log

Config file [..]/software/atlas/atlas/workflow/../config/default_config.yaml is extended by additional config specified via the command line.
Building DAG of jobs...
Failed to open source file https://github.com/snakemake/snakemake-wrappers/raw/v1.19.0/bio/minimap2/aligner/environment.yaml
ConnectionError: HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /snakemake/snakemake-wrappers/raw/v1.19.0/bio/minimap2/aligner/environment.yaml (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f8311e68650>: Failed to establish a new connection: [Errno 101] Network is unreachable')), attempt 1/3 failed - retrying in 3 seconds...
Failed to open source file https://github.com/snakemake/snakemake-wrappers/raw/v1.19.0/bio/minimap2/aligner/environment.yaml
ConnectionError: HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /snakemake/snakemake-wrappers/raw/v1.19.0/bio/minimap2/aligner/environment.yaml (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f8311881550>: Failed to establish a new connection: [Errno 101] Network is unreachable')), attempt 2/3 failed - retrying in 6 seconds...
Failed to open source file https://github.com/snakemake/snakemake-wrappers/raw/v1.19.0/bio/minimap2/aligner/environment.yaml
ConnectionError: HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /snakemake/snakemake-wrappers/raw/v1.19.0/bio/minimap2/aligner/environment.yaml (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f8311883110>: Failed to establish a new connection: [Errno 101] Network is unreachable')), attempt 3/3 failed - giving up!
WorkflowError:
Failed to open source file https://github.com/snakemake/snakemake-wrappers/raw/v1.19.0/bio/minimap2/aligner/environment.yaml
ConnectionError: HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /snakemake/snakemake-wrappers/raw/v1.19.0/bio/minimap2/aligner/environment.yaml (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f8311883110>: Failed to establish a new connection: [Errno 101] Network is unreachable'))
  File "[...]/software/miniconda3/envs/atlas-dev/lib/python3.11/site-packages/reretry/api.py", line 218, in retry_call
  File "[...]/software/miniconda3/envs/atlas-dev/lib/python3.11/site-packages/reretry/api.py", line 31, in __retry_internal  

It seems that the download didn't work. I tried rerunning it to see if it was a temporary issue, but the error remains. I think it is related to our SLURM setup, which works in offline mode only (see #666). Is there a way to download the wrapper upfront?

SilasK commented 1 year ago

Probably the simplest thing to do is to use the test data (2 small samples) from the docs and run it. You can --skip-qc and run the process on the cluster until this error occurs. then Run atlas on the main nodes to install the wrappers.

this is an other solution: https://github.com/metagenome-atlas/atlas/issues/587#issuecomment-1327467019

github-actions[bot] commented 11 months ago

There was no activity since some time. I hope your issue is solved in the mean time. This issue will automatically close soon if no further activity occurs.

Thank you for your contributions.

jotech commented 11 months ago

I almost forgot to give feedback. Your solution, as proposed in #587, worked indeed:

git clone https://github.com/snakemake/snakemake-wrappers
atlas run --wrapper-prefix 'git+file://path/to/your/local/clone'

Thank you!

SilasK commented 11 months ago

Thank you very much.