Open jchristgit opened 5 months ago
@jb3 Do you have an idea for how best to do this? Right now I'm not even sure how to deploy alerts to Prometheus in Kubernetes in the first place. I think for the documentation I will make a separate issue though.
Noting to self, we can set the config map prefs to always query the apiserver for the latest changes, hence nullifying the propagation delay of changes.
I lied, this is a kubelet option, we cannot set this per configmap, we will have to do some smart in-pod detection at Prometheus that the reload has gone through.
There is however always a timestamp in the mounted directory, we just need to keep checking this timestamp (probably with a recurring kubectl exec).
/prometheus $ ls -la /opt/pydis/prometheus/alerts.d/
total 12
drwxrwsrwx 3 root 2000 4096 Apr 30 19:24 .
drwxr-xr-x 3 root root 4096 Apr 26 21:41 ..
drwxr-sr-x 2 root 2000 4096 Apr 30 19:24 ..2024_04_30_19_24_46.1524242850
lrwxrwxrwx 1 root 2000 32 Apr 30 19:24 ..data -> ..2024_04_30_19_24_46.1524242850
lrwxrwxrwx 1 root 2000 24 Apr 26 21:39 alertmanager.yaml -> ..data/alertmanager.yaml
lrwxrwxrwx 1 root 2000 24 Apr 26 21:39 certificates.yaml -> ..data/certificates.yaml
lrwxrwxrwx 1 root 2000 19 Apr 26 21:39 coredns.yaml -> ..data/coredns.yaml
lrwxrwxrwx 1 root 2000 15 Apr 26 21:39 cpu.yaml -> ..data/cpu.yaml
lrwxrwxrwx 1 root 2000 18 Apr 26 21:39 django.yaml -> ..data/django.yaml
lrwxrwxrwx 1 root 2000 16 Apr 26 21:39 etcd.yaml -> ..data/etcd.yaml
lrwxrwxrwx 1 root 2000 16 Apr 26 21:39 jobs.yaml -> ..data/jobs.yaml
lrwxrwxrwx 1 root 2000 18 Apr 26 21:39 memory.yaml -> ..data/memory.yaml
lrwxrwxrwx 1 root 2000 17 Apr 26 21:39 nginx.yaml -> ..data/nginx.yaml
lrwxrwxrwx 1 root 2000 17 Apr 26 21:39 nodes.yaml -> ..data/nodes.yaml
lrwxrwxrwx 1 root 2000 16 Apr 26 21:39 pods.yaml -> ..data/pods.yaml
lrwxrwxrwx 1 root 2000 20 Apr 26 21:39 postgres.yaml -> ..data/postgres.yaml
lrwxrwxrwx 1 root 2000 22 Apr 26 21:39 prometheus.yaml -> ..data/prometheus.yaml
lrwxrwxrwx 1 root 2000 17 Apr 26 21:39 redis.yaml -> ..data/redis.yaml
Another related issue for a potential future feature kubernetes/kubernetes#22368 (open for 7 years though, yikes!)
Can't we check for the git diffs when the ci runs, and if we find configmap files (that we will identify following some rule/logic), we apply them ?
Can't we check for the git diffs when the ci runs, and if we find configmap files (that we will identify following some rule/logic), we apply them ?
This is a good idea. But from my understanding, the issue was that we don't really know when Kubernetes has rolled out the configmaps.
We could simply sleep for 10 seconds and then apply it. If eventual consistency isn't consistent in 10 seconds, then I guess I'm done.
Unfortunately the settling of configmap updates cannot be guaranteed on live pods during that window, it's a scheduled job on the kubelet from memory.
The Kubernetes solution is just to have a sidecar container running something like inotify or whatever the modern equivalents are and then upon detecting a change it can call out via the HTTP management API to Prometheus or (I think) send a signal to the process, I can't remember if sidecars share the same process namespace.
I'll investigate this one later today.
This is a sound idea. inotifywait in a bash script should be sufficient.
I do think that containres in the same pod share the same process namespace, if not maybe we can configure it, if not we can use the HTTP management API, but we need to make sure this is locked down externally.
However, with automated reloads like this we should ensure we have an alert in case of config reload failures. We do not have this yet, do we?
However, with automated reloads like this we should ensure we have an alert in case of config reload failures. We do not have this yet, do we?
We should be able to add an alert for this yes, I'll include it when I PR this feature in. prometheus_config_last_reload_successful
should handle it.
Right now, changes to our Prometheus alerts need to be deployed manually.
We should incorporate a deployment for this into GitHub actions on the
main
branch such that any changes are automatically rolled out without requiring to know the local setup.