Discussion about Slurm queue policies

I envision 3 operation modes

In italic are 'nice to have' features that may or may not be difficult to achieve.

Mode1: shared usage for debugging (default mode for human users)

resources are allocated in non-exclusive mode by default (that is running srun or salloc without any other qualifier)
multiple users can coexist on the same node at the same time, especially if they requested the same resource explicitly (e.g., -wleconte, -N6 -pbezout
Prefer not sharing if possible: user A calls srun -N3 -pbezout, user B calls srun -N3 -pbezout, the workload should spread on all 6 Bezout nodes before we reuse

Not needed: fine grain allocation of resources

Difficulty: number of "access tokens" may still be required for load balancing purposes (3), and using core allocation is a poor substitute, because it has effect on cgroups and other actual access policy to the hardware resources within the allocation.

Mode2: exclusive usage for production runs (human user requested)

resources are allocated in exclusive mode if the user so specifies (how that gets specified is not completely clear yet, the srun -exclusive may or may not do what we want based on requirements for mode 3: backfill), so maybe srun --reservation=exclusive, srun --reservation=exclusive-nightly etc.
A single user can use the resource, that is, other sharing-mode srun and ssh cannot login while the job is active (slurm_pam should do that out-of-the-box)
exclusive jobs during the day have a short time limit (e.g. 1h) to prevent resource hoarding, exclusive-nightly have a longer time limit (e.g., until 7am next business day).
The exclusive nighly mode may terminate existing srun and ssh accesses (using the slurm_pam module should be able to do both prevention and termination for ssh access, but ssh termination may require some customization).
exclusive nightly jobs are uninterruptible until 7am the next business day, but may overstay until a competing shared or exclusive job exist in the queue that would use these resources is actually submitted

Not needed (actually problematic): fine grain allocation of resources, I want guarantee I have a full node and no other stuff is running at the same time (including Jenkins, GH actions, ...)

Difficulty: if we have the fine grain allocation scheduler active, we can simply reserve all resources, but users may still want to execute multiple srun inside a given salloc/sbatch and spread subjobs however they want, I think that should work out-of-the-box but needs verified

Mode 3: GH actions/backfill

GitHub actions, Jenkins and other automations use a backfill scheduler
The backfill jobs can be interrupted by the arrival of user-created jobs, and that doesn't cause the CI pipeline to generate an error, just reschedule the pipeline to a later date (not sure how difficult that is to actually do)
backfill uses the fine grain allocation policy (so that we can run more actions at the same time, if for example we know they require only 1 GPU, and we have 8, we may run 8 ctest simultaneously)

Difficulty: using fine-grain allocation in one mode forces us to use the fine-grain scheduler in all modes, which we don't care much about and may complicate how we allocate shared jobs.

icl-utk-edu / cluster

Discussion about Slurm queue policies #4

Mode1: shared usage for debugging (default mode for human users)

Mode2: exclusive usage for production runs (human user requested)