metagenome-atlas / atlas

ATLAS - Three commands to start analyzing your metagenome data
https://metagenome-atlas.github.io/
BSD 3-Clause "New" or "Revised" License
372 stars 98 forks source link

'sbatch: error: Unable to open file' during cluster execution. #367

Closed jjsanchezgil closed 3 years ago

jjsanchezgil commented 3 years ago

Hi

I was trying to run qc (CentOS with Slurm) and I was constantly getting the error:

Traceback (most recent call last):
File "/home/jsanchez-gil/.config/snakemake/cluster/scheduler.py", line 70, in <module>
raise Exception("Job can't be submitted\n"+output.decode("utf-8")+error.decode("utf-8"))
Exception: Job can't be submitted
sbatch: error: Unable to open file

The log file looks like this for all samples:

[Sat Jan 30 10:15:27 2021]
rule initialize_qc:
    input: /hpc/Metagenomes/ATLAS/databases/Reads/S-1-3_R1.fastq.gz, /hpc/Metagenomes/ATLAS/databases/Reads/S-1-3_R2.fastq.gz
    output: S-1-3/sequence_quality_control/S-1-3_raw_R1.fastq.gz, S-1-3/sequence_quality_control/S-1-3_raw_R2.fastq.gz
    log: S-1-3/logs/QC/init.log
    jobid: 162
    wildcards: sample=S-1-3
    priority: 80
    threads: 4
    resources: mem=10, java_mem=8, time=0.5

Error submitting jobscript (exit code 1):

Why could that be? Thank you

EDIT: ok, I found the error. In the new version of key_mapping.yaml there is a trailing space after 'g' in: mem: "--mem={}g " Removing it solved the error.

When printing command.split(' ') in scheduler.py before the Popen instantiation (line 67), this was the output: ['sbatch', '--parsable', '--output=cluster_log/slurm-%j.out', '--error=cluster_log/slurm-%j.out', '--job-name=initialize_qc', '--cpus-per-task=4', '-n', '1', '--mem=10g', '', '--time=30', '-N', '1', '/hpc/Metagenomes/ATLAS/.snakemake/tmp.06vxvzta/snakejob.initialize_qc.178.sh']

SilasK commented 3 years ago

I'm sorry, when did you install the latest cluster profile? I updated it yesterday.

Can you send me the command submitted. It should be printed to the stdout. I think it has to do that the log file that should be written in the cluster_log directory which apparently is not created correctly.

jjsanchezgil commented 3 years ago

Hi @SilasK

I installed the profile yesterday (after #364) and today. After seeing that the problem was the space I downloaded it again today and the space was there. This is the command and the output for the first read pair:


(atlas-env) [jsanchez-gil@hpcs03 ~]$ atlas run qc --profile cluster --working-dir /hpc/Metagenomes/ATLAS

[2021-01-30 14:09 INFO] Executing: snakemake --snakefile /hpc/miniconda3/envs/atlas-env/lib/python3.6/site-packages/atlas/Snakefile --directory /hpc/Metagenomes/ATLAS  --rerun-incomplete --configfile '/hpc/Metagenomes/ATLAS/config.yaml' --nolock  --profile cluster --use-conda --conda-prefix /hpc/Metagenomes/ATLAS/databases/conda_envs   qc
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cluster nodes: 99
Job counts:
        count   jobs
        16      apply_quality_filter
        1       build_decontamination_db
        1       build_qc_report
        16      calculate_insert_size
        1       combine_insert_stats
        1       combine_read_counts
        1       combine_read_length_stats
        16      deduplicate_reads
        16      finalize_sample_qc
        80      get_read_stats
        16      initialize_qc
        1       qc
        16      qcreads
        16      run_decontamination
        16      write_read_counts
        214
[Sat Jan 30 14:09:44 2021]
rule initialize_qc:
    input: /hpc/Metagenomes/ATLAS/databases/Reads/S-2-21_R1.fastq.gz, /hpc/Metagenomes/ATLAS/databases/Reads/S-2-21_R2.fastq.gz
    output: S-2-21/sequence_quality_control/S-2-21_raw_R1.fastq.gz, S-2-21/sequence_quality_control/S-2-21_raw_R2.fastq.gz
    log: S-2-21/logs/QC/init.log
    jobid: 158
    wildcards: sample=S-2-21
    priority: 80
    threads: 4
    resources: mem=10, java_mem=8, time=0.5

CLUSTER: submit command: sbatch --parsable --output=cluster_log/slurm-%j.out --error=cluster_log/slurm-%j.out --job-name=initialize_qc --cpus-per-task=4 -n 1 --mem=10g  --time=30 -N 1 /hpc/Metagenomes/ATLAS/.snakemake/tmp.2n6r1o1o/snakejob.initialize_qc.158.sh

Traceback (most recent call last):
  File "/home/jsanchez-gil/.config/snakemake/cluster/scheduler.py", line 70, in <module>
    raise Exception("Job can't be submitted\n"+output.decode("utf-8")+error.decode("utf-8"))
Exception: Job can't be submitted
sbatch: error: Unable to open file

Error submitting jobscript (exit code 1):

Because it says that the error was in line 70 in scheduler.py, I saw the error was coming from p.communicate() call. So the first thing I checked was if there was any error in the instantiation of p in p = Popen(command.split(' '), stdout=PIPE, stderr=PIPE) by adding eprint(command.split(' ') in the line before (so I could see through stdout the actual value of command.split(' '). The result was this:

CLUSTER: ['sbatch', '--parsable', '--output=cluster_log/slurm-%j.out', '--error=cluster_log/slurm-%j.out', '--job-name=initialize_qc', '--cpus-per-task=4', '-n', '1', '--mem=10g', '', '--time=30', '-N', '1', '/hpc/Metagenomes/ATLAS/.snakemake/tmp.06vxvzta/snakejob.initialize_qc.178.sh']

When splitting by ' ', there was an extra command after --mem=10g, so I went to key_mapping.yaml and I saw mem: "--mem={}g " in line 9 inside the Slurm command constructor. After removing that space, everything started working fine

SilasK commented 3 years ago

Thank you very much for identifying the problem. I fixed it in the cluterprofile.