Open paulfantom opened 4 years ago
Alerts with the same name and different threshold to decide the severity is quite common, I don't think that's a generally applicable rule.
Alerts with the same name and different threshold to decide the severity is quite common
For some reason, promtool doesn't treat those as duplicates.
As far as I know we don't treat alerts with same name as duplicates, cause they are not.
On example:
- alert: KubeAPIErrorBudgetBurn
annotations:
message: The API server is burning too much error budget
expr: |
sum(apiserver_request:burnrate1h) > (14.40 * 0.01000)
and
sum(apiserver_request:burnrate5m) > (14.40 * 0.01000)
for: 2m
labels:
severity: critical
- alert: KubeAPIErrorBudgetBurn
annotations:
message: The API server is burning too much error budget
expr: |
sum(apiserver_request:burnrate6h) > (6.00 * 0.01000)
and
sum(apiserver_request:burnrate30m) > (6.00 * 0.01000)
for: 15m
labels:
severity: critical
This is completely ok and works in Prometheus/ Alertmanager.
This is what we used in Thanos to deduplicate alerts -> https://github.com/thanos-io/thanos/pull/2263/files#diff-5e861d5245f0514ca397e835ef86ca05R62-R69
Actually promtool complains:
promtool check rules prometheus_alerts.yaml
Checking prometheus_alerts.yaml
2 duplicate rules(s) found.
Metric: KubeAPIErrorBudgetBurn
Label(s):
severity: critical
Metric: KubeAPIErrorBudgetBurn
Label(s):
severity: warning
Has anyone followed up with promtool upstream?
Currently (as of https://github.com/kubernetes-monitoring/kubernetes-mixin/commit/bf3064885199f90080bec6790f2d27c5ad08184d) there are alerts with the same name but different expressions (ex. https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/master/alerts/resource_alerts.libsonnet#L26 and https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/master/alerts/resource_alerts.libsonnet#L62). It could be prevented by parsing output of promtool and failing when detected duplicates or modifying upstream promtool to fail when duplicates are detected. WDYT?