mschubert / clustermq

R package to send function calls as jobs on LSF, SGE, Slurm, PBS/Torque, or each via SSH
https://mschubert.github.io/clustermq/
Apache License 2.0
146 stars 27 forks source link

Using a Singularity Container as Environment for Simulations #260

Closed nilsbeyer closed 3 years ago

nilsbeyer commented 3 years ago

Greetings,

our goal is, to run clustermq in an environment defined by a Singularity image.

I considered what is in written in https://mschubert.github.io/clustermq/articles/userguide.html#environments but neither of its two sections seemed to be helpful to me:

  1. The section "Environments for workers" only uses bashenv.
  2. The section "Running master inside containers" adresses singularity but assumes that the master process is inside a container.

Our cluster allows for the following syntax (This is a script that can be run with sbatch):

#!/bin/bash #SBATCH --job-name=demo #SBATCH --output=res.txt #SBATCH --ntasks=1 #SBATCH --mem-per-cpu=2000 #SBATCH --time=1:00:00 module load singularity srun singularity exec my_singularity_image.sif ./my_executable.py

Here srun conveniently executes my_executable.py inside a container created from the Singularity image.

Naively this looks to me like the easiest way to connect clustermq with Singularity, but I am absolutely no expert on this. Does it make sense to implement for example a scheduler-option in clustermq that works with this? Or is there an existing way of running jobs inside singularity containers, that i overlooked (for example manipulating the template somehow)? I am happy to help/test with this feature! Thanks for taking the time! Nils

mschubert commented 3 years ago

I'm a bit confused: I'd think pt 1 ("Environments for workers") solves your issue?

What keeps you from adding the srun to the template?

nilsbeyer commented 3 years ago

What you are saying sounds promising in the sense that there seems to be an easy solution. :) What keeps me from it, is likely that i am a beginner in R and SLURM.

None of the templates I've used so far included an srun command. I thought that the template merely exists to supply an srun-command with the necessary parameters.

Pt 1 ("Environments for workers") merely activates a bash or conda environment, that can be specified by the name of the environment.

The 'srun singularity exec ' syntax also expects an executable file. How do I specifiy the name of such a file for each job? Is it the same as the job_name? (as in #BSUB-J {{ job_name }}[1-{{ n_jobs }}] # name of the job / array jobs) I am not aware that such a file is specified in the template.

nilsbeyer commented 3 years ago

I now altered the last line in the clustermq_slurm.tmpl to the following CMQ_AUTH={{ auth }} singularity exec my_singularity_image.sif R --no-save --no-restore -e 'clustermq:::worker("{{ master }}")'. But now the simple script seems to be stuck at

Submitting 5 worker jobs (ID: test) ... Running 5 calculations (0 objs/0 Mb common; 1 calls/chunk) ...

not terminating ever.

mschubert commented 3 years ago

Can you try:

export CMQ_AUTH={{ auth }}
singularity exec my_singularity_image.sif R --no-save --no-restore -e 'clustermq:::worker("{{ master }}")'

and run with log_worker=TRUE?

nilsbeyer commented 3 years ago

Logging the workers revealed that the last time I had tested it, i forgot to load the singularity module on the cluster. My mistake. It works now. Thanks a lot for your time!

mschubert commented 3 years ago

Great to hear that it works! :+1: