python-discord / infra

Infrastructure for Python Discord
https://docs.pydis.wtf
MIT License
12 stars 4 forks source link

Netcup Prometheus alerting for unreachable Alertmanagers #219

Open jchristgit opened 5 months ago

jchristgit commented 5 months ago

We need to configure our Alertmanager to send us alerts on Discord such that we can be informed of anything not being right as part of the monitoring setup on lovelace.

jb3 commented 5 months ago

As discussed in the dev-ops channel, I think we can reach a configuration here that utilizes our existing High-Availability AlertManager setup.

We can set up token access for Prometheus on Ansible machines to push alerts through to the Kubernetes HA AlertManager.

Some notes:

jchristgit commented 5 months ago

Just to clarify this from a discussion on Discord, this is about adding a "dead man's switch" alert that will route to Discord in case the Netcup Prometheus instance can't contact the Alertmanager in Kubernetes properly. To cover this case we want to: