mllg / batchtools

Tools for computation on batch systems
https://mllg.github.io/batchtools/
GNU Lesser General Public License v3.0
172 stars 51 forks source link

squeue cmd not found when running R targets on singularity docker image #294

Open ailtonpcf opened 1 year ago

ailtonpcf commented 1 year ago

To whom it may concern, Thank you for the templates to use on HPC :D

I'm triggering an R target pipeline on snakemake together with R docker image on singularity. The pipeline:

`#!/usr/bin/env R

work_dir <- "06-fungal-control" source(here::here(paste("src", work_dir, "defaults.R", sep = "/")))

tar_option_set(packages = c("tidyverse","tarchetypes"), format = "qs", memory = "transient", garbage_collection = TRUE, storage = "worker", retrieval = "worker")

library(future) library(future.batchtools)

future::plan( tweak( future.batchtools::batchtools_slurm, template="src/06-fungal-control/slurm.tmpl", resources=list( walltime=259200,#minutes memory=62500, ncpus=4, ntasks=1, partition="standard", chunks.as.arrayjobs=TRUE) ) )

list( tar_target( metadata, read_tsv("raw/04-tedersoo-global-mycobiome/Tedersoo L, Mikryukov V, Anslan S et al. Fungi_GSMc_sample_metadata.txt") ), tar_target( continent_countries, read_csv("raw/05-countries-continent/countries.csv") ), tar_target( subset_samples, european_samples(metadata, continent_countries) ), tar_target( raw_abundance, read_tsv("raw/04-tedersoo-global-mycobiome/Fungi_GSMc_OTU_Table.txt") ), tar_target( taxonomy, get_taxonomy("raw/04-tedersoo-global-mycobiome/Tedersoo L, Mikryukov V, Anslan S et al. Fungi_GSMc_data_biom.biom") ), tar_target( raw_abundance_long, long_abundance(raw_abundance, subset_samples) ) )`

However, it doesn't work. R complains that the squeue command is not found. Here's the log:

`Date = Tue May 16 10:59:08 CEST 2023 Hostname = node069 Working Directory = /home/qi47rin/proj/02-compost-microbes/src/06-fungal-control

Number of Nodes Allocated = 1 Number of Tasks Allocated = 1 Number of Cores/Task Allocated = 1

Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 1 (use --cores to define parallelism) Rules claiming more threads will be scaled down. Job stats: job count min threads max threads


get_fungal_spikein 1 1 1 targets 1 1 1 total 2 1 1

Select jobs to execute...

[Tue May 16 10:59:15 2023] rule get_fungal_spikein: input: src/06-fungal-control/analyze_server.R output: logs/06-fungal-control/spike.log jobid: 1 reason: Missing output files: logs/06-fungal-control/spike.log resources: tmpdir=/tmp

Activating singularity image /home/qi47rin/proj/02-compost-microbes/.snakemake/singularity/8c1aaca4ec464428d6d90db9c1dc0fbf.simg running '/usr/local/lib/R/bin/R --no-echo --no-restore --no-save --no-restore --file=src/06-fungal-control/analyze_server.R'

here() starts at /home/qi47rin/proj/02-compost-microbes Global env bootstraped. here() starts at /home/qi47rin/proj/02-compost-microbes Global env bootstraped. ✔ skip target continent_countries ✔ skip target metadata ✔ skip target subset_samples ✔ skip target taxonomy • start target raw_abundance ✔ skip pipeline Warning message: In readLines(template) : incomplete final line found on '/home/qi47rin/proj/02-compost-microbes/src/06-fungal-control/slurm.tmpl' Error : Listing of jobs failed (exit code 127); cmd: 'squeue --user=$USER --states=R,S,CG --noheader --format=%i -r' output: command not found Error in tar_throw_run(): ! ! in callr subprocess. Caused by error: ! Listing of jobs failed (exit code 127); cmd: 'squeue --user=$USER --states=R,S,CG --noheader --format=%i -r' output: command not found Visit https://books.ropensci.org/targets/debugging.html for debugging advice. Backtrace: ▆

  1. └─targets::tar_make_future(workers = 4)
  2. └─targets:::callr_outer(...)
  3. └─base::tryCatch(...)
  4. └─base (local) tryCatchList(expr, classes, parentenv, handlers)
  5. └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
  6. └─value[3L]
  7. └─targets::tar_throw_run(...)
  8. └─rlang::abort(...) Execution halted [Tue May 16 10:59:27 2023] Error in rule get_fungal_spikein: jobid: 1 output: logs/06-fungal-control/spike.log shell:

    Rscript             --no-save             --no-restore             --verbose             src/06-fungal-control/analyze_server.R | tee logs/06-fungal-control/spike.log
    
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job get_fungal_spikein since they might be corrupted: logs/06-fungal-control/spike.log Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: src/06-fungal-control/.snakemake/log/2023-05-16T105913.649364.snakemake.log`

It worked before with conda, because when you activate an environment, every app remains available. However in a container, there are problems when squeue do queries regard user id and the slurm system id. I also tried to mount slurm volumes ... but it didn't work either. Then, is there a way to avoid the squeue command when using tar_make_future to trigger jobs on slurm?

Thanks is advance, AIlton.

tmspvn commented 3 months ago

Hi, have you find a solution?

i half did:

export SINGULARITY_BINDPATH="$SINGULARITY_BINDPATH,/etc/passwd,/var/run/munge,/usr/lib64/libmunge.so.2.0.0:/usr/lib64/libmunge.so.2,/run/slurm/conf/slurm.conf:/etc/slurm/slurm.conf,/usr/lib64/slurm,/usr/bin/sbatch,/usr/bin/squeue,/usr/bin/scancel"

Edit: it runs, but in my cluster it return BatchtoolsExpiration error, I opened a discussion here but I've got no answer from the developers to date (23/7/24)