[alert_generator] "mismatch in EndsAt" error question

hagen1778 commented 2 years ago

alert_generator test suite checks the received alerts for the correctness of their properties. One of those checks is comparing if EndsAt param is within the time range between now (when alert was received by alert_generator) and now+delta, where delta is usually 4*resendDelay - see https://github.com/prometheus/compliance/blob/main/alert_generator/cases/expected_alert.go#L80-L96

However, the time when alert was received isn't always the time when alert was triggered. Since Prometheus aligns the time slots when alert should be executed, the real time and timestamp of alert execution can differ - see https://github.com/prometheus/prometheus/blob/580e852f1028ecbcaa67836f2da5230ac7c35fd0/rules/manager.go#L411-L419

Should this mean, that alert_generator should calculate EndsAt param based on alert's ActiveAt param instead of time when alert was actually received?

codesome commented 2 years ago

Should this mean, that alert_generator should calculate EndsAt param based on alert's ActiveAt param instead of time when alert was actually received?

No, because ActiveAt is when it went into pending for the first time. And the alert need not end after the fixed time. So the alert end is set w.r.t. when the alert was sent by the alert-generator. (so that if no more alert is sent due to any reasons, this end at can be used).

If you look at the checks, it accounts for some delay. Which is for missing a group interval and going into the next, hence there is a cushion for upto 30s of delay since group interval for the rules here is 30s.

How much delay are you seeing here? Are you facing any edge cases where you miss a group interval and there is like few microseconds or milliseconds of additional delay on top of that because of the way timestamp is kept track?

hagen1778 commented 2 years ago

Thanks for explanation!

prometheus / compliance

[alert_generator] "mismatch in EndsAt" error question #84