uselagoon / remote-controller

A group of controllers for handling Lagoon builds and tasks in Kubernetes or Openshift
5 stars 1 forks source link

Kubernetes API issues result in orphaned builds which never complete causing infinite queueing #223

Closed steveworley closed 1 year ago

steveworley commented 1 year ago

Describe the bug

In clusters where the Kubernetes API can have intermittent connectivity issues (noticed more in Azure environments that AWS) a build can become orphaned. This usually happens when the build controller is trying to update a build status and cannot make the API request because of connectivity issues. This looks to prevent the build status from being updated which causes subsequent builds to be queued indefinitely.

This usually manifests with build errors that look like:

##############################################
Start Build Process
##############################################
++ set +x
Unable to connect to the server: net/http: TLS handshake timeout

To Reproduce

Steps to reproduce the behavior:

  1. Trigger a build
  2. Cause some sort of net interruption between APIs
  3. Build status is stuck in "started"

Expected behavior

A process exists that reaps orphaned builds.

Screenshots

Additional context