As of now, pulling of Docker container images and their conversion to SIF is done in the slurm-job.vk.io/pre-exec annotation, which in turn is injected in the SLURM job script which represents the k8s pod.
IMO, this has two disadvantages:
Error-prone: it requires the user to remember to pull the right image and update the pod accordingly. Also, as of now, the user has to remember to pull the image when the image is not there, and not to pull it if the image is already there (to avoid unnecessary pull and conversion overheads).
Potential waste of compute time. Downloading and converting a Docker image to a SIF file takes minutes (around 10 minutes if I well remember), which is compute time you get charged for by the HPC while not actually using their resources. The waste of compute time increases as the resources allocation (i.e., nodes) increases.
Summary of proposed changes
Creating a dedicated annotation to manage the container image (e.g., pull and convert if not already there) before the SLURM job starts.
Short Description of the issue
As of now, pulling of Docker container images and their conversion to SIF is done in the
slurm-job.vk.io/pre-exec
annotation, which in turn is injected in the SLURM job script which represents the k8s pod.IMO, this has two disadvantages:
Summary of proposed changes
Creating a dedicated annotation to manage the container image (e.g., pull and convert if not already there) before the SLURM job starts.