Open smlx opened 3 years ago
Controller fails a build if it can't start for whatever reason, imagepullbackoff is one of them.
If the image is a genuine fail to pull, then we would need some sort of timeout on it to prevent it from blocking future builds.
I'm ok with doing this if its something we want to support?
I think it would be nice to have some kind of timeout to allow kubernetes time to recover. If only to avoid Lagoon users being confused about why their deploys sporadically fail. It doesn't have to be long - just enough for kubernetes to retry the image pull.
I've only seen this once, so it isn't an urgent problem.
I had a deploy fail almost immediately when the
kubectl-build-deploy-dind
image went intoImagePullBackoff
. Here's what happened:success
back from Lagoon API.ImagePullBackoff
(the others started running).ImagePullBackoff
build pod disappeared.failed
in Lagoon dashboard:I would have expected the pod to eventually start running instead of failing the deploy. The other builds started at the same time ran fine, so the image pull error may have just been a transient network issue?
I ran deploy on this environment a second time and it ran through fine.