Lab resource sharing algorithm needed

I would really like to open up all the lab servers for use by everybody.

This requires solutions to some technical problems:

Make lab servers easy to administrate (well-defined configuration for each server).
Make lab servers productive environments (learn to love working in containers).
Share resources efficiently (don't leave hardware idle and also don't collide with each other).

The first two are being solved nicely: NixOS for managing the servers and then Docker for hiding NixOS from users :-). Now we need a solution for efficient resource sharing.

Here are a few ideas:

Reserve whole servers exclusively for one person/group e.g. for a week at a time.
Create a smart resource reservation system that automatically assigns exclusive cores and NICs for a running process.
Create a dumb resource reservation system that takes exclusive ownership of a whole server for the duration of a test run.
(Insert new idea here.)

I see (1) happening by default and that seems unfortunate. (2) would be nice but maybe is too complex especially when dealing with issues like NUMA. (3) could be a reasonable compromise?

I really want to make consistent installations of NixOS across davos, chur, grindelwald, interlaken, and jura including user definitions.

(There is another alternative, to "cloudify" the lab e.g. to submit jobs via the CI and have them scheduled magically somehow, but I think that this way lies madness.)

Here is one idea for a compromise between simplicity and fanciness.

Suppose we had a script hw ...command... that would reserve hardware resources (cores and NICs) for the duration of a command. The command would automatically execute with affinity to the right cores (implicit NUMA affinity if the cores are chosen from the same node) and the PCI addresses would be supplied in environment variables like $PCI0. If the hardware is busy then the script would wait until it is available.

Example that reserves four NICs and three cores, all belonging to NUMA node 0:

#!/bin/sh

if [ $# == 0 ]; then
  echo "Usage: hw <command...>"
  echo
  echo "Run a command with hardware (NICs and CPU cores) exclusively reserved."
  echo "The command will automatically have affinity to the CPU cores."
  echo "The NICs will be supplied with NIC0..NICn environment vars."
  exit 1
fi

(
  flock -n 9 || exit 1
  NIC0=0000:01:00.0 NIC1=0000:01:00.1 \
  NIC2=0000:02:00.0 NIC3=0000:02:00.1 \
  taskset -c 0,1,2 \
    "$@"
) 9>/var/tmp/hwlock

Example usage:

$ ./hw bash -c 'echo NIC0 = $NIC0; echo affinity:; taskset -p $$'                                                        
NIC0 = 0000:01:00.0
affinity:
pid 2590's current affinity mask: 7

The idea is that we would have several such scripts on each server that would be able to reserve different kinds of hardware. You would choose which script to use based on your hardware needs: do you want to use the hardware that is reserved for CI? do you want one pair of NICs? two pairs? four pairs? the whole machine?

The user running tests would then have to choose a server (e.g. grindelwald) and a hardware profile (e.g. script that reserves 2 NICs and 2 cores) but the rest would be automatic. The code they run could depend on already having CPU and NUMA affinity (unless deliberately testing on a whole machine or multinode hardware profile) and would always choose NICs based on env variables instead of magic numbers.

Few more potential usages and extensions:

Reserve resources for one hour of interactive testing: timeout 3600 hw bash -i.
Scripts that reserve a small amount of resources could probe a few possibilities for availability (nonblocking flock).
Scripts could log some usage information for calculating a "load average" i.e. summary of how busy each hardware profile of each server has been lately.

Just an idea.

snabbco / snabb

Lab resource sharing algorithm needed #634