nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.77k stars 630 forks source link

Error when using Shifter as container runtime in Nextflow >23.08.0-edge #5115

Open jackscanlan opened 4 months ago

jackscanlan commented 4 months ago

When using Shifter (shifter.enabled = true), processes using containers fail (see error output below). I first noticed this in Nextflow version 24.04.2, but after stepping through versions, 23.08.1-edge and higher (tested up to 24.04.2) throw this error, but 23.08.0-edge and lower do not.

Program output

Typical error output:

ERROR ~ Error executing process > 'FREYR:PARSE_INPUTS (Whole dataset)'

Caused by:
  Process `FREYR:PARSE_INPUTS (Whole dataset)` terminated with an error exit status (127)

Command executed:

  #!/usr/bin/env Rscript

  ### defining Nextflow environment variables as R variables
  ## input channel variables
  samplesheet =           "test_data/dual/samplesheet_read_dir.csv"
  loci_params =           "test_data/dual/loci_params.csv"

  ## global variables
  projectDir = "/group/pathogens/IAWS/Personal/JackS/dev/freyr"
  params_dict = "[help:null, data_folder:test_data/dual, refdir:reference, samplesheet:test_data/dual/samplesheet_read_dir.csv, loci_params:test_data/dual/loci_params.csv, extension:null, illumina:true, pacbio:false, nanopore:false, paired:true, high_sensitivity:true, threads:null, rdata:true, slurm_account:pathogens, max_memory:2.GB, max_cpus:1, max_time:10.m]"

  ### source functions and themes, load packages, and import Nextflow params
  ### from "bin/process_start.R"
  sys.source("/group/pathogens/IAWS/Personal/JackS/dev/freyr/bin/process_start.R", envir = .GlobalEnv)

  ### run module code
  sys.source(
      "/group/pathogens/IAWS/Personal/JackS/dev/freyr/bin/parse_inputs.R", # run script
      envir = .GlobalEnv # this allows import of existing objects like projectDir
  )

  ### save .RData for debugging
  if ("true" == "true") {
      save.image()
  } else {
      NULL
  }

Command exit status:
  127

Command output:

  2024-07-05T14:53:51 Pulling Image: docker:jackscanlan/piperline-multi:0.0.1, status: READY
  2a3f21b84278741a6d7226093a69f1463064515cf6d5dbdd3fe172039b8b6c05
  2a3f21b84278741a6d7226093a69f1463064515cf6d5dbdd3fe172039b8b6c05

Command error:
  /var/spool/slurm/d/job28977797/slurm_script: line 326: NXF_TASK_WORKDIR=/group/pathogens/IAWS/Personal/JackS/dev/freyr/work/68/bc374c9440715e61fb629278528cd2: No such file or directory

Work dir:
  /group/pathogens/IAWS/Personal/JackS/dev/freyr/work/68/bc374c9440715e61fb629278528cd2

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details

Error seems to be coming from this part of the process' .command.run file:

nxf_launch() {
    shifterimg pull docker:jackscanlan/piperline-multi:0.0.1
    shifterimg lookup docker:jackscanlan/piperline-multi:0.0.1
    while ! shifterimg lookup docker:jackscanlan/piperline-multi:0.0.1; do
        sleep 5
        STATUS=$(shifterimg -v pull docker:jackscanlan/piperline-multi:0.0.1 | tail -n2 | head -n1 | awk '{print $6}')
        [[ $STATUS == "FAILURE" || -z $STATUS ]] && echo "Shifter failed to pull image 'docker:jackscanlan/piperline-multi:0.0.1'" >&2  && exit 1
    done
    ${NXF_TASK_WORKDIR:+"NXF_TASK_WORKDIR=$NXF_TASK_WORKDIR"} NXF_DEBUG=${NXF_DEBUG:=0} shifter --image docker:jackscanlan/piperline-multi:0.0.1 /bin/bash -c "eval $(nxf_container_env); /bin/bash /group/pathogens/IAWS/Personal/JackS/dev/freyr/work/68/bc374c9440715e61fb629278528cd2/.command.run nxf_trace"
}

.nextflow.log for the command: NXF_VER=23.08.1-edge nextflow run AVR-biosecurity-bioinformatics/freyr -profile basc_slurm,test

.nextflow.log for the command: NXF_VER=24.04.2 nextflow run AVR-biosecurity-bioinformatics/freyr -profile basc_slurm,test

Steps to reproduce the problem

If you have Shifter installed, you can reproduce using a test dataset with my pipeline:

# runs fine past first process requiring container ('PARSE_INPUTS')
NXF_VER=23.08.0-edge nextflow run AVR-biosecurity-bioinformatics/freyr -profile shifter,test

# throws error during first process requiring container ('PARSE_INPUTS')
NXF_VER=24.04.2 nextflow run AVR-biosecurity-bioinformatics/freyr -profile shifter,test

I've reproduced this with a different, independent pipeline on the same system:

git clone https://github.com/tuannguyen8390/nf-EXPLOR.git 

# runs fine past first process requiring container ('make_map_index' or 'make_gatk_index')
NXF_VER=23.08.0-edge nextflow run setup.nf -profile shifter

# throws error during first process requiring container ('make_map_index' or 'make_gatk_index')
NXF_VER=24.04.2 nextflow run setup.nf -profile shifter

Environment