Alert when multiple Builds fail at the same time

uselagoon / remote-controller

A group of controllers for handling Lagoon builds and tasks in Kubernetes or Openshift

5 stars 1 forks source link

Alert when multiple Builds fail at the same time #124

Closed Schnitzel closed 1 year ago

Schnitzel commented 6 years ago

Single failed builds will not be very informational to alert about, as also code issues can cause failed deployments. But we could try to implement a logic in the system that realizes if all started builds in the last 15mins have all failed, which would point to a more infrastructure issue than an individual environment issue.

tobybellwood commented 2 years ago

These alerting systems are currently outside the scope of Lagoon, but could be handled at the cluster level with Prometheus?

We can look at adding a metrics endpoint to the controller though

shreddedbacon commented 2 years ago

v0.4.1 of the controller has a new metrics endpoint, but the metrics are pretty basic at the moment but might be enough to check for consistent failures over time as there is a counter (increment) for total build failures

shreddedbacon commented 1 year ago

Closing, as the metrics endpoint exists. If it doesn't provide suitable information for alerting, then we can extend the metrics provided to try and cover what is required.