interTwin-eu / interlink-slurm-plugin

MIT License
1 stars 3 forks source link

Singularity pull before job starts #36

Open matbun opened 1 week ago

matbun commented 1 week ago

Short Description of the issue

As of now, pulling of Docker container images and their conversion to SIF is done in the slurm-job.vk.io/pre-exec annotation, which in turn is injected in the SLURM job script which represents the k8s pod.

IMO, this has two disadvantages:

  1. Error-prone: it requires the user to remember to pull the right image and update the pod accordingly. Also, as of now, the user has to remember to pull the image when the image is not there, and not to pull it if the image is already there (to avoid unnecessary pull and conversion overheads).
  2. Potential waste of compute time. Downloading and converting a Docker image to a SIF file takes minutes (around 10 minutes if I well remember), which is compute time you get charged for by the HPC while not actually using their resources. The waste of compute time increases as the resources allocation (i.e., nodes) increases.

Summary of proposed changes

Creating a dedicated annotation to manage the container image (e.g., pull and convert if not already there) before the SLURM job starts.