Open alphneo opened 1 month ago
Hi @ppcano I think Alert Grouping was recently updated by you - can you please take a look at this issue? Thanks!
Hi @alphneo ,
Thank you for sharing your feedback. We know the behavior of the timers can be somewhat confusing (to put it lightly). We provided an example in the documentation to help clarify this, but as you pointed out, there's still room for improvement.
The best way to grasp how they function is to experiment with dummy alerts, view them in the Grafana Alerting UI, and receive their notifications. Also for additional references, these timers operate similarly to the Prometheus AlertManager settings: group_interval
, repeat_interval
, and group_wait
. You can read more about these here:
https://prometheus.io/docs/alerting/latest/configuration/#route
Let me address your questions and comments. Please feel free to correct me if I misunderstood anything.
So, how were 4 alerts notified after the group interval elapsed? Have you also considered the first 2, which were sent during group wait?
Yes, the first 2 alerts are still part of the frontend
group. Please note that the "Number of instances" column reflects the current number of alerts in the group at any particular "Time".
The first 2 alerts remained in the group because they are still firing. According to the documentation:
"An alert instance exits the group after being resolved and notified of its state change." (we should probably highlight this more)
I guess they don’t need to be, if the backend notification policy group is seen at 05:50?
Why not? The backend
and frontend
alerts belong to different groups, and these groups are entirely independent - they are not related to each other.
After the repeat interval is met, 4 alerts were considered for both notification policy groups. I don’t see a clear explanation for this.
This is due to the same behavior: "An alert instance exits the group after being resolved". In this case, the 8 alerts (4 frontend alerts and 4 backend alerts) have not yet been resolved, so they remain in their respective groups.
I hope this clarifies your question. Please feel free to follow up for further questions or explanation.
For details on how alert evaluation works, see also Alert rule evaluation : The alert rule is continuously evaluated at other intervals, and generating the same alert instance (identified by its label set).
Hi,
I can not fully grasp the exact behavior of group intervals and repeat intervals, and there is room for improvement in documentation. I have the following doubts, and I hope you can clarify them.
I believe the incoming alert instance into a group remains in the group after the evaluation wait period is over and the alert still fires for each evaluation until a notification is sent. If I am not wrong here, please consider adding an evaluation interval case, if some rule is fired first and not fired again just before the group wait/group interval elapsed, is it in the group or not during the group wait/group interval?
At 00:30 after the group wait elapsed for frontend notification policy group, 2 alerts were notified and then during group interval for 5 minutes there were only 2 alerts fired for the same group, so how 4 alerts were notified after the group interval elapsed, have you also considered the first 2 which were sent during group wait, I guess they do not have to be if backend notification policy group is seen at 05:50 after group interval lapsed nothing was sent even the 2 alerts triggered during the group wait and so after group interval elapse should it be 2 alerts!
After repeat interval is met, 4 alerts were considered to be for both of the notification policy groups, I don't see there is clear way of explanation, has the alert should be continuously fired for entire repeat interval duration without entering into normal state even once, or if any alert fired twice,1 before the start of repeat interval and 2 after the repeat interval elapse and in between it can be in normal state, in such case it does not sound like a reminder than a new one and in another case if a new alert pops up right before repeat interval is it considered.
Please consider updating the documentation it can save time a lot because it is a bit harder to know what is happening, without this one needs to spend time experimenting with notifications and flooding the in-box to know the deterministic behavior.
Documentation source: https://grafana.com/docs/grafana/latest/alerting/fundamentals/notifications/group-alert-notifications/
Thank you