grafana / oncall

Developer-friendly incident response with brilliant Slack integration
GNU Affero General Public License v3.0
3.5k stars 288 forks source link

Shared integrations between teams #981

Open jonathan-mothership opened 1 year ago

jonathan-mothership commented 1 year ago

Hi there, our team is loving Grafana Oncall and are looking to expand its usage to other teams within the company, spreading alerting like it was glitter. We've hit a snag with integration configuration for teams using our Prometheus AlertManager set up. We have a shared operational model where a platform team provides tooling consumed by the rest of the engineering organization. The integration with AM includes team labels in alerts we route to the appropriate team. Unfortunately, oncall doesn't seem to allow us to share the same integration for all teams, so we're right now thinking we're forced to put all teams under General. Ideally we'd be able to use the label in the alert from a shared integration to route to a specific escalation chain, regardless of team. Is this possible?

Matvey-Kuk commented 1 year ago

@jonathan-mothership thank you for warm words about OnCall!

I noticed other teams dealing with such a limitation using duplicated receivers section. So all alerts could go to each team. This may be helpful as a workaround.

Btw I agree we'll need shared integrations at some point.

jonathan-mothership commented 1 year ago

Hey @Matvey-Kuk, thanks for the tip, this is specifically what we're hoping to avoid as it would require our AlertManager instances to be aware of teams which means that our infrastructure tooling would too.

Matvey-Kuk commented 1 year ago

@jonathan-mothership other hacky hack is to re-route AG's from one integration to another using Webhooks 😬

jonathan-mothership commented 1 year ago

I'd considered the extra hop, it was a curiosity I didn't poke at. I had wondered if the payloads would come across the same and didn't test. Would there be any functional or cosmetic differences in the alert information passed-on to things like Slack or pager systems?