uselagoon / remote-controller

A group of controllers for handling Lagoon builds and tasks in Kubernetes or Openshift
5 stars 1 forks source link

how to abort a build #33

Closed Schnitzel closed 3 years ago

Schnitzel commented 3 years ago

I ran into an issue where I wanted to abort a build (the build was stuck on waiting for a deployment with a crashloopbacking pod to finish, which takes a long time)

There was already a new build scheduled from lagoon (dashboard showed new)

I then tried:

  1. delete the build pod, nothing happened
  2. delete the LagoonBuild object, also nothing happened
  3. only after I restarted the whole lagoon-build-deploy the new build was picked up

Did I do something wrong? Can we make that the build is marked as failed by the controller after the build pod is deleted?

shreddedbacon commented 3 years ago

Should be able to just patch the Pod with the label lagoon.sh/cancelBuild=true and the controller will handle the clean up of the pods and statuses.

kubectl -n $NAMESPACE patch pods lagoon-build-xxxxxx \
  --type=merge \
  --patch '{"metadata":{"labels":{"lagoon.sh/cancelBuild":"true"}}}'

EDIT: It should be the pod resource that gets labelled, not the lagoonbuild resource.

Schnitzel commented 3 years ago

uff, that's quite a lot to know :)) especially if we compare to openshift where we just had to delete the pod.

how hard would it be to let the controller cancel the build when the pod is deleted?

shreddedbacon commented 3 years ago

Would need to check, probably not super hard to implement.

Just deleting the pod is not really the ideal way IMO as you'll lose the logs, maybe other things will be lost too.

The label is what is added when a cancel button is clicked in the UI though, so the mechanism is already built into the controller to handle the clean up and cancellation correctly to inform Lagoon that it has failed (and collect the logs etc).

shreddedbacon commented 3 years ago

The lagoon build job itself (kubectl-build-deploy-dind) should probably be a bit smarter too, in checking for crashloop etc and failing rather than resulting in a long running build.