nordic-rse / nordic-rse.github.io

The community of Research Software Engineers in Nordic countries.
https://nordic-rse.org
MIT License
13 stars 15 forks source link

Seminar event idea: "Blurring the lines: Singularity containerisation of SLURM orchestrators" #323

Closed frankier closed 2 years ago

frankier commented 2 years ago

Title

Blurring the lines: Singularity containerisation of SLURM orchestrators

Are you running it?

Yes

Abstract / description

Singularity is a container platform for HPC. As well as addressing the security concerns of HPC administrators, its "convention over configuration" approach (e.g. binding the current working directory into the container by default) seems to dovetail well with needs of software development for HPC environments. In particular, it encourages writing software which can both be run in a container in a HPC environment and tested uncontainerised on a laptop for a faster hack-test-loop, as well as interoperating well with the typical SLURM + networked file system design of modern HPC clusters.

While SLURM provides some relatively high level tools for job orchestration like job arrays, there are also tools such as Snakemake and Ray which are cluster agnostic but can make use of SLURM (with slurm-profile and yaspi), or run limited to a single laptop. However, SLURM connector plugins typically work by running the SLURM utilities like squeue, which are only available on the host. While it is theoretically possibly to bind host executables and libraries into the container, this introduces strong library version requirement coupling between the container and the host. Therefore, I present singreqrun, a shim for requesting the host runs programs from within the container.

The talk begins with a quick roll call of the players: Singularity, SLURM, HPC, Snakemake and Ray.

In a live shell session (with all Python code preprepared), I then demonstrate the ways in which singreqrun can be used: 1) Snakemake for heterogeneous (mixture of CPU and GPU nodes) video corpus processing which can be ported across HPC clusters 2) Snakemake for text corpus processing including using extra Singularity containers for utilities 3) Ray for hyperparameter search

I end the talk by asking for comments. In particular, is this the right direction or a hack too far? Are there better ways to combine general purpose container orchestrators + Singularity + SLURM? The current implementation is quite hacky, and more like a proof-of-concept. If it is a good idea, how can we stabilise and improve upon this approach?

The talk expands on some ideas I give in a blog post: https://frankie.robertson.name/research/effective-cluster-computing/#use-monolithic

Event type

Duration

60-120 minutes

Date

On or after the 22nd of November.

Questions

I'm interested in case you have any general comments already or would like to steer the direction of the talk.

I am interested in the possibility of allowing people to ask in advance for an invite to a CSC project and then setting it up so that people can code along in directories with some stuff already set up. What are people's thoughts about this?

bast commented 2 years ago

This is a super interesting collection of topics. Thanks for suggesting this!

samumantha commented 2 years ago

Agreed. We would be happy to have your workshop in November. Which date and time would suit you best?

I also like the idea of setting stuff up beforehand so that people can code along. The problem I see with CSC accounts and invitation to a project there is, that only people from Finnish research institutions and universities could do that. I have not much of an idea about what your talk is about, but would providing an image for setting up a virtual machine also work? Then eg CSC users could use cPouta, others could use other providers?

frankier commented 2 years ago

I would like to schedule this during week 47, some time Tuesday to Friday (so 23rd-26th) at 12-14 EEST. My timetable is pretty open, so I would like know, are there any days which don't suit people? I will also poll among places where I will try and cross-advertise this and cross post this to Zulip in a moment.

frankier commented 2 years ago

Yes it would be possible to set up some kind of container with a complete SLURM cluster in a box, however that is one yak I don't feel like shaving. I think it should be possible to invite people from other countries as long as they have an institutional email. At least I have managed to do this before so I guess I'll just assume it's possible.

frankier commented 2 years ago

Okay so let's say provisionally 23rd of November 13-15 CEST since this seems to fit in reasonably with how previous talks have been scheduled.

lucaferranti commented 2 years ago

mission successfully accomplished 💪 🎉