Open PhantomPhreak opened 2 years ago
@PhantomPhreak thank you for such a great feedback! Polishing connection between Grafana and OnCall is in our focus, we'll take your proposal into account at the next design discussion. It sounds reasonable for me.
Спасибо!
This issue has been automatically marked as stale because it has not had activity in the last 120 days.
This issue has been automatically marked as stale because it has not had activity in the last 120 days.
Current heartbeat configuration is following for the Oncall: 1) Create a dashboard in grafana, add some metric and create an alert, constantly staying in the "Firing" state 2) Create a contact point in Grafana with a specific
/heartbeat
webhook for the each integration in the Oncall 3) Create an alert rule for the metric from step1 to the contact point, created on step2As i understand, heartbeats designed to check the
alert source
->oncall
connectivity This approach providing a common way for sending heartbeats for the different Oncall integrations (grafana/webhook/etc.)Few problems here: a) with a grafana, it's mandatory to use a datasource, supporting alerts, to generate a fake data (constant line for example), used for triggering alerts. Thus, if datasource become unavailable, oncall will trigger a false positive alert about the
grafana
->oncall
connectivity. I've tried to use embedded Grafana'sRandom Walk
datasource to exclude the external datasource from the test chain, but it doesn't support alerting.b) for each Oncall Grafana integration (which is actually represented as a contact point in Grafana), it's necessary do make additional configuration for making alert, and alert rule, and there will be always an alert in
Firing
state. Which is expected for the normal grafana -> oncall integration, but it can be interpreted as something bad (red is BAD usually).The easiest solution here, IMO, is following: For each Oncall integration, contact point in Grafana should have an additional setting in
Optional Webhook settings
: heartbeats. When configured, grafana will trigger/heartbeat
endpoint for the configured webhook periodically, testing the connectivity.But this require changes in Grafana, not in the Oncall, as i understand. Anyway, this just an idea how to simplify the configuration and make hearbeats more robust.
PS. Oncall is awesome, many thanks for making this great product! Спасибо :)