Closed krzyzacy closed 4 years ago
cc @cjwagner - do we ever set up other alerts from velodrome? They support email or slack, I'll probably hook it up with #testing-ops channel, and do we have a slack token stored somewhere?
Yeah, I believe that Quintin had some alerts configured on velodrome at some point. I'm not sure where they are sent though. There should be a slack token in a secret in the service cluster.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
/remove-lifecycle stale will get to that some day....
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
AFAIK we'll move to prow's monitoring stack
/remove-lifecycle stale /assign @clarketm /unassign
What's left to be done now that Boskos is using the Prow monitoring stack (#15344)?
What's left to be done now that Boskos is using the Prow monitoring stack (#15344)?
Once we have some data from the new metrics we'll be able to pick an appropriate alert threshold and add prometheus alerts like the ones defined in this dir: https://github.com/kubernetes/test-infra/tree/master/prow/cluster/monitoring/mixins/prometheus
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
isn't this done @cjwagner ?
Yeah, I think we can call this done: https://github.com/kubernetes/test-infra/blob/47050c4743c0381165543bcc587a3094c2c5c179/prow/cluster/monitoring/mixins/prometheus/boskos_alerts.libsonnet#L4-L28 We even had an alert last week for a resource type we'd deleted but which was still being tracked by Boskos.
/close
@ixdy: Closing this issue.
also x-ref #15412
we should alert when the main pool (gce, gke) volume is lower than ~25%, I'll poke around when I have time.
cc @BenTheElder /area boskos /assign