miniwdl-ext / miniwdl-slurm

MIT License
4 stars 1 forks source link

miniwdl-slurm didn't finish the job #8

Closed Plogeur closed 2 months ago

Plogeur commented 2 months ago

I don't understand why I can't start/finish my pipeline with miniwdl-slurm (it works with a normal singurality config without miniwdl-slurm)

sbatch.sh :

#!/bin/bash
#SBATCH -J miniwdl_run
#SBATCH -o miniwdl_run_output.out
#SBATCH -e miniwdl_run_error.out
#SBATCH -t 24:00:00
#SBATCH --mem 128G
#SBATCH -c 32

module load containers/singularity/3.9.9 devel/python/Python-3.11.1
miniwdl run -v workflows/giraffe.wdl --cfg miniwdl_slurm_test.cfg --copy-input-files -i params/giraffe.json

miniwdl_slurm_test.cfg :

[scheduler]
container_backend=slurm_singularity
task_concurrency=200
fail_fast = false

[call_cache]
put = true
get = true
dir = "$PWD/miniwdl_call_cache"

[task_runtime]
defaults = {
        "maxRetries": 2,
        "docker": "ubuntu:20.04"
    }

[singularity]
exe = ["singularity"]
run_options = [
        "--containall"
    ]
image_cache = "$PWD/miniwdl_singularity_cache"

error :

malias@genobioinfo2 ~/work/vg_wdl_call/_LAST $ cat error.json 
{
  "error": "RunFailed",
  "workflow": "Giraffe",
  "run": "Giraffe",
  "dir": "/work/user/malias/vg_wdl_call/20240620_170349_Giraffe",
  "cause": {
    "error": "AssertionError",
    "run": "call-kmerCountingKMC",
    "dir": "/work/user/malias/vg_wdl_call/20240620_170349_Giraffe/call-HaplotypeSampling/call-kmerCountingKMC",
    "pos": {
      "source": "/work/user/malias/vg_wdl_call/tasks/bioinfo_utils.wdl",
      "line": 77,
      "column": 1
    }
  },
  "pos": {
    "source": "/work/user/malias/vg_wdl_call/workflows/giraffe.wdl",
    "line": 9,
    "column": 1
  }
}

malias@genobioinfo2 ~/work/vg_wdl_call/_LAST/call-HaplotypeSampling/call-kmerCountingKMC $ cat stderr.txt
+ echo /mnt/miniwdl_task_container/work/_miniwdl_inputs/0/reads_1.fastq.gz
+ echo /mnt/miniwdl_task_container/work/_miniwdl_inputs/0/reads_2.fastq.gz
+ kmc -k29 -m8 -okff -t6 @scratch_file.txt haplotype_sampled_graph .
***+ echo /mnt/miniwdl_task_container/work/_miniwdl_inputs/0/reads_1.fastq.gz
+ echo /mnt/miniwdl_task_container/work/_miniwdl_inputs/0/reads_2.fastq.gz
+ kmc -k29 -m8 -okff -t6 @scratch_file.txt haplotype_sampled_graph .
*
****
+ echo /mnt/miniwdl_task_container/work/_miniwdl_inputs/0/reads_1.fastq.gz
+ echo /mnt/miniwdl_task_container/work/_miniwdl_inputs/0/reads_2.fastq.gz
+ kmc -k29 -m8 -okff -t6 @scratch_file.txt haplotype_sampled_graph .
+ echo /mnt/miniwdl_task_container/work/_miniwdl_inputs/0/reads_1.fastq.gz
****
+ echo /mnt/miniwdl_task_container/work/_miniwdl_inputs/0/reads_2.fastq.gz
+ kmc -k29 -m8 -okff -t6 @scratch_file.txt haplotype_sampled_graph .
+ echo /mnt/miniwdl_task_container/work/_miniwdl_inputs/0/reads_1.fastq.gz
+ echo /mnt/miniwdl_task_container/work/_miniwdl_inputs/0/reads_2.fastq.gz
+ kmc -k29 -m8 -okff -t6 @scratch_file.txt haplotype_sampled_graph .
****
Stage 1: 100%
Stage 1: 100%
Stage 1: 100%
****
Stage 1: 100%
Stage 1: 100%
Stage 2: 100%
Stage 2: 100%
Stage 2: 100%
+ rm scratch_file.txt
+ rm scratch_file.txt
+ rm scratch_file.txt
rm: can't remove 'scratch_file.txt': No such file or directory
rm: can't remove 'scratch_file.txt': No such file or directory
Stage 2: 100%
+ rm scratch_file.txt
rm: can't remove 'scratch_file.txt': No such file or directory
Stage 2: 100%
+ rm scratch_file.txt
rm: can't remove 'scratch_file.txt': No such file or directory
rhpvorderman commented 2 months ago

Seems like an error in evaluating the wdl task :

    "pos": {
      "source": "/work/user/malias/vg_wdl_call/tasks/bioinfo_utils.wdl",
      "line": 77,
      "column": 1
    }

rm: can't remove 'scratch_file.txt': No such file or directory This sort of error can occur on networked filesystem where a file is not immediately visible by other nodes.

Plogeur commented 2 months ago

What is the procedure to follow to resolve such a problem ?

rhpvorderman commented 2 months ago

Asking someone who is experienced on your cluster for help. Since the error is not being thrown in the miniwdl-slurm code but in your WDL task I cannot really help you.

Plogeur commented 2 months ago

ok thx for your help.