grafana / oncall

Developer-friendly incident response with brilliant Slack integration
GNU Affero General Public License v3.0
3.53k stars 292 forks source link

Introduce a ticketing system #1332

Open dannykopping opened 1 year ago

dannykopping commented 1 year ago

We use OnCall to run Grafana Cloud Logs. Our infrastructure comprises many k8s clusters and Loki installations. We try to keep all of our alerts actionable, and these alerts largely fall into 3 categories right now:

We have to keep critical pages, but the other two could both be lumped together into the same category if we had a way of tracking the alerts that have fired; this is where a ticketing system might come in.

In my mind, this is how I imagine it working:

  1. an alert fires and is not extremely urgent to warrant paging an engineer
  2. an entry gets added to a backlog of alerts to address 2.1. if the alert self-resolves, mark the entry as auto-resolved 2.2. if further alerts fire which have the same name (or have the same grouping), reparent them under the original
  3. the on-caller can address the entries in their own time, and mark the entries as done / unactionable / etc

There's obviously quite a lot to this so I'll stop there, but keen to get the conversation started. I think OnCall is the natural place for this (as opposed to GitHub Issues, although that could be an alternative target?), and will provide the following benefits:

pmohan6 commented 1 year ago

We would love to see this feature as well! A ticketing system being a part of Grafana Oncall makes a lot of sense.