healthchecks / healthchecks

Open-source cron job and background task monitoring service, written in Python & Django
https://healthchecks.io
BSD 3-Clause "New" or "Revised" License
8.3k stars 841 forks source link

Gitlab Alerts #860

Open mtesch-um opened 1 year ago

mtesch-um commented 1 year ago

[Draft - still not sure what the requirements should be here... but wanted a place to gather and share... maybe this just turns into documentation about how to set this up to save others from working through it or having a suboptimal setup (or me from having a suboptimal setup!).]

Would be nice to have an integration w/ Gitlab Alerts. It can be (sort of) done manually right now with webhooks, but it's not obvious how to do it, and maybe doesn't quite have full-feature support(?)

The Alert webhook interface documentation: https://docs.gitlab.com/ee/operations/incident_management/integrations.html#http-endpoints

To setup a gitlab webhook integration in https://healthchecks.io/integrations/<uuid>/edit/

One thing that appears to be missing (I haven't figured it out yet anyway) is a per-failure "fingerprint" which I think would allow the healthcheck failure to map 1-1 with an Alert and Incident in gitlab.

cuu508 commented 1 year ago

Pagerduty webhook payloads have an incident_key field, which I think is similar to the fingerprint, it is used for grouping notifications about the "same thing" together. In the Pagerduty integration we use check's code as the incident key.

mtesch-um commented 1 year ago

👍 I'll let it run as is for a few days and see how it works in relation to the Alerts/Incident management built into gitlab.

I suspect we might want to have separate Alerts for separate failures, even for the same code, which maybe could be the last good rid before the failure, or some event identifier for the webhook-down event or the cron schedule time that triggered the last failure (even if it's an UP event)?