Closed codesome closed 2 years ago
cc @RichiH
I tried to find corner cases and thought I had them a few times, but in the end couldn't find any; this seems like good coverage.
Conceptually, I think it would make sense to group the tests:
for
conditions, values changing, time series simply going away)It might make sense to visualize a state machine to make reasoning and verification that all possible states are covered easier.
Yup, I plan to have the least number of rules possible that can test all the above cases within a time bound.
Added another case
ALERTS
series from the rules above it in the same group.One extra case to test is that annotations can depend on user-defined labels. e.g. define a rule with a custom label of env={{ $labels.namespace }}
and then have an annotation that uses this label e.g. summary=the env should be {{ $labels.env }}
Assuming the alert had a namespace of eu-west-0
it should come out as alertname=myalert,namespace=eu-west-0,env=eu-west-0,summary=the env should be eu-west-0
One extra case to test is that annotations can depend on user-defined labels.
I think that is not the case. I just tried it out with Prometheus and verified how we do it in the code, we only take the labels from the query result to expand the template. And even if we override the labels in the rule, the template expansion will still take the original label from the query.
Here is the example (notice instance
and test
in labels and its use in annotations):
But thanks for bringing it up, we will need test case to verify this behaviour and also update the spec.
Spec has this
The labels and annotation templates from the alerting rule MUST be run for each of these alerts individually with label-value data for the template coming from the corresponding element from the result vector.
So looks like we are all good
:tada: all the cases and template variables should be covered at this point. I have excluded the template variables that are only used in template files.
There are few things still remaining to make the test suite usable. I will create new issues for them.
Based on the specification, here is the list of all the high-level cases that needs to be covered by the test suite. In all the cases, the content of the alerts, APIs, time series, are checked to be correct.
pending
->firing
->inactive
.pending
->inactive
.pending
orfiring
)pending
alerts having changing annotation values (checked via API)firing
andinactive
alerts being sent when they first went into those states.firing
alert being re-sent at expected intervals when the alert is active with changing annotation contents.inactive
alert being re-sent at expected intervals up to a certain time and not after that.firing
state (skipping thepending
state) because of zerofor
duration.inactive
state for both the cases wherefor
duration is zero and non zero. Here we should test 2 cases: One whereinactive
alert was still being sent, hence should stop sending that. Two is theinactive
alert was not being sent anymore.pending
->firing
->inactive
while already having active alerts.for
duration is non-zero and less than the evaluation interval,firing
alert must be sent after the second evaluation of the rule and not before.ALERTS
series from the rules above it in the same group.firing
andpending
.All the time comparison will be done within a certain acceptable delta and need not be exact.