grafana / oncall

Developer-friendly incident response with brilliant Slack integration
GNU Affero General Public License v3.0
3.44k stars 276 forks source link

Grafana integration heartbeats cause false-positive alarms #420

Open PhantomPhreak opened 2 years ago

PhantomPhreak commented 2 years ago

Current heartbeat configuration is following for the Oncall: 1) Create a dashboard in grafana, add some metric and create an alert, constantly staying in the "Firing" state 2) Create a contact point in Grafana with a specific /heartbeat webhook for the each integration in the Oncall 3) Create an alert rule for the metric from step1 to the contact point, created on step2

As i understand, heartbeats designed to check the alert source -> oncall connectivity This approach providing a common way for sending heartbeats for the different Oncall integrations (grafana/webhook/etc.)

Few problems here: a) with a grafana, it's mandatory to use a datasource, supporting alerts, to generate a fake data (constant line for example), used for triggering alerts. Thus, if datasource become unavailable, oncall will trigger a false positive alert about the grafana -> oncall connectivity. I've tried to use embedded Grafana's Random Walk datasource to exclude the external datasource from the test chain, but it doesn't support alerting.

b) for each Oncall Grafana integration (which is actually represented as a contact point in Grafana), it's necessary do make additional configuration for making alert, and alert rule, and there will be always an alert in Firing state. Which is expected for the normal grafana -> oncall integration, but it can be interpreted as something bad (red is BAD usually).

The easiest solution here, IMO, is following: For each Oncall integration, contact point in Grafana should have an additional setting in Optional Webhook settings: heartbeats. When configured, grafana will trigger /heartbeat endpoint for the configured webhook periodically, testing the connectivity.

But this require changes in Grafana, not in the Oncall, as i understand. Anyway, this just an idea how to simplify the configuration and make hearbeats more robust.

PS. Oncall is awesome, many thanks for making this great product! Спасибо :)

Matvey-Kuk commented 2 years ago

@PhantomPhreak thank you for such a great feedback! Polishing connection between Grafana and OnCall is in our focus, we'll take your proposal into account at the next design discussion. It sounds reasonable for me.

Спасибо!

github-actions[bot] commented 6 months ago

This issue has been automatically marked as stale because it has not had activity in the last 120 days.

github-actions[bot] commented 1 day ago

This issue has been automatically marked as stale because it has not had activity in the last 120 days.