Open tiborsimko opened 2 years ago
suggestion: Or we can change ifNotPresent
to Always
. k8s will compare image digest (hash) and if it is cached locally, it will use the local image, if it is not cached or digests are different, it will pull a new image from the registry (docs).
If Always
is used, it will, probably, add overhead to k8s nodes of querying a registry to check if a cached image is the same as one in the registry (one HTTP request, I guess). Not sure how much this will affect the pod starting time.
But regarding adding an image tag and digest to logs, I think, it is a good idea overall. Not quite sure about exposing the node names as it can, potentially, be a security issue (?).
Always
will bring some overhead, which may be considerable in case of multi-GiB-large particle physics images... Hence we opted for IfNotPresent
as default, together with promoting semantic versioning of docker images, which is the best for ensuring reproducitbility anyway! The reana-client validate
also checks for the most comonly used latest
, but it doesn't get everything.
So yes, hopefully we can stay on IfNotPresent
... But switching to Always
via helm values is always an option.
Current behaviour
It happens that when users use non-semantically-versioned environment images such as
myenviroment:latest
ormyenvironment:master
, and they update this image using the same image tag, the cluster nodes won't pull the new version because of the usualifNotPreset
image pull policy.It can then happen that some cluster nodes have "old" version of the image, while other cluster nodes have "new" version of the image, leading to seemingly random workflow run failures.
Currently, it is not easy to detect these situations by the user, because REANA does not expose in the job logs which image sha1 was exactly used for the job. The cluster administrators can check and rectify this easily by removing images on the nodes, which forces re-pull of the image for the next run. For example by running the following one-liner:
Howewer, we can perhaps do something better to help the users.
Expected behaviour
Ideally we should display in the job logs that the job was run using image
myenvironment:latest
withsha1
of such and such value:We could perhaps even consider exposing the node name where the job runs, which could be useful in forensics such as CephFS CSI plugins being down on some nodes etc.