grafana / oncall

Developer-friendly incident response with brilliant Slack integration
GNU Affero General Public License v3.0
3.53k stars 292 forks source link

create silece for alert till next working hours #174

Closed freeseacher closed 5 months ago

freeseacher commented 2 years ago

Lots of small p5 alerts can wait till next working hours. As schedule owner i would like to setup some kind of working hours or time to debug and close low priority alerts. something like notify mon: 9-18, tue: 9-18... fri: 9-18, sat, sun do not notify and two option notify if still firing, notify anyway.

of cause low priority alerts lead to alert fatigue and must me eliminated but still popular case

raphael-batte commented 2 years ago

@freeseacher We plan to add 'working hours' time for users in our upcoming scheduling tools. @Matvey-Kuk Probably, we should add additional notification options in user settings as well.

Matvey-Kuk commented 2 years ago

Low-priority alert groups could be paused in escalation chain using this step:

Screenshot 2022-06-29 at 15 41 30

@freeseacher will it work for you or you are looking for something else?

freeseacher commented 2 years ago

Sounds great! but that only a time. how i can skip till Monday morning?

raphael-batte commented 2 years ago

Something like this in user settings? image

freeseacher commented 2 years ago

:thinking:. that settings sounds very opinionated for me. what is low-priority? how do they defined ? for me that is not personal setting but option in silence image

or part of alert proccessing before escalation

as for me i see them throw prism of another product like this

image

raphael-batte commented 2 years ago

@freeseacher how do these settings work if there are people in different timezones with different working hours in one schedule?

freeseacher commented 2 years ago

great question! i believe that notification should be delivered as fast as possible. so it should be delivered to someone who is on call first

Matvey-Kuk commented 2 years ago

@raphael-batte we have this concept on the Escalation Chain level (with time), why do you think it should go to the Notification Level?

raphael-batte commented 2 years ago

@Matvey-Kuk I am not suggesting removing the silence from the OnCall escalation chain level. But if users need local time silence, it should be on profile level.

Example:

  1. We have an escalation chain with silence step, the server time is Tel Aviv TZ
  2. At 9:00 a.m. monday we restart the chain, but at this time we have an engineer from New York oncall, who is now outside of his working hours.

If he does not have personal silence option in user-settings, he will receive an alerts.

Matvey-Kuk commented 2 years ago

Time-based silence in the user space will mean "we escalate to a user" -> "user reacts in a few hours".

It's a pattern we generally want to avoid project-wise. We don't want alerts to "stick" to users for multiple hours. The goal of OnCall is to be an effective alert distributor between team members. What if the user will go to a vacation after the timeout? How to indicate the reason of long response delay to other users?

I believe in this particular case the alert should stuck on the escalation chain, or be routed to the other person if there is a time-zone distributed team immediately.

So I think we either should think of improving our Calendar either about adding more adjustable escaltion step.

raphael-batte commented 2 years ago

Yes, this is the other side of this situation. Agree with the need to rethink/improve the transition mechanic here.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had activity in the last 120 days.

joeyorlando commented 5 months ago

@freeseacher I think this should now be achievable with the custom silence durations: Screenshot 2024-06-14 at 13 14 41 Screenshot 2024-06-14 at 13 14 44

I'll go ahead and close this out but feel free to open a new feature request if this does not suit what you need!