Closed FUSAKLA closed 5 years ago
Could this comment be relevant?
Yup. What's the use case for scalar one then? (:
I feel like we can add scalar support if needed, but trying to understand the use case (:
Well generally we use those rules for threshold alerting where mostly the alert is still the same so we just join the metrics based on label with scalar threshold with particular label.
example:
- record: alerting_threshold:barrel_creation_time_seconds
expr: "8700"
labels:
name: coec-barrels
- record: alerting_threshold:barrel_creation_time_seconds
expr: "7000"
labels:
name: search-barrels
# metric `barrel_creation_time` has label `name` which matches those `name` labels of ` alerting_threshold:barrel_creation_time_seconds`
- alert: AgeOfBarrel
expr: (time() - (barrel_creation_time > 0)) > ON(name) GROUP_LEFT() alerting_threshold:barrel_creation_time_seconds
for: 5m
labels:
team: xxx
severity: warning
channel: xxx
annotations:
title: Age of {{$labels.app}} barrel is too old
description: Barrel {{$labels.name}} of app {{$labels.app}} is too old ({{$value
| humanizeDuration}}) on {{$labels.instance}}.{{$labels.locality}}).
This is pretty common I think. There is even blog post about this on Robust Perception https://www.robustperception.io/using-time-series-as-alert-thresholds.
I'm not blocked by this, as I said it was just testing case. Those alerts will be on Prometheis instances but I can imagine the alert could need data from multiple Prometheis instances. Than you'd need to record those threshold rules on one one of the Prometheis instances and use them in the Thanos rule node in the alert. But this would spread configuration of the one alert on two distinct components which would be kind of uncomfortable but still possible.
I'd say adding note about this to the documentation could be sufficient solution for now. Possibly the support can be added when needed as you mentioned, I'm not pushing it
Sounds like a valid use case for me, was looking exactly for this justification (:
Thanos, Prometheus and Golang version used Build from master 3fd740f
What happened I deployed rule node and put there some random alerts and rules we have just to test if it works and log is full of warnings such as.
The recording rule reffered in the log is:
How to reproduce it (as minimally and precisely as possible): I suspect that he cause is the
expr
containing only number and no metric. I have more rules in there and only those used just as a thresholds causes those warnings.The error message is not that completely clear but it also sounds like the issue is that when un-marshaling it's not expecting to get only a number.
On Prometheus instance version
2.3.2
it's working ok.