open-fresh / bomb-squad

Automatic detection and silencing of high cardinality series in Prometheus.
Apache License 2.0
139 stars 12 forks source link

Make Bomb Squad handle Prometheus config reloads itself #18

Open cboggs opened 6 years ago

cboggs commented 6 years ago

Bomb Squad's initial implementation just uses https://github.com/jimmidyson/configmap-reload sidecar to detect config updates on disk and prompt a Prometheus reload automatically. This is great for Kubernetes users, but no help for anyone else. Given the more focused needs of Bomb Squad, similar logic can be pulled in to the binary and thus work on many more platforms / config schemes.

Suggestion is just to mimic the behavior of the configmap-reload project, pulling in fsnotify and handling things appropriately.

Bonus points for using SIGHUP instead of the mgmt web API, too! (yay security)

(Adding to beta milestone as it removes a dependency, even for K8s-based deployments. More general non-K8s support is likely to be a post-beta set of features).

pcarranza commented 6 years ago

SIGHUP is not hard.

something like https://golang.org/pkg/os/#FindProcess + https://golang.org/pkg/os/#Process.Signal

I think that bomb-squad should only notify when it changed a file, thus fsnotify is not really needed.

cboggs commented 6 years ago

I've tried that approach, but the trouble comes with the fact that configMap updates from the pod's perspective are eventually consistent, so I can't really know when the updates that bomb squad has made have hit the file unless I build a bunch of "check for this rule I expect, and when it shows up, reload". fsnotify would just be easier, I think.

But otherwise, yes, SIGHUP is easy, where it's usable. I believe K8s still doesn't use shared PID namespaces in pods, so it can't work there yet.

pcarranza commented 6 years ago

Oh, right.... in k8s the configmap is modified by other stuff.

We could do it lazy then, when it must introduce a change it pulls the configmap (or the file if it's outside of k8s), applies the changes and then writes the changes down.

In a k8s where the configmap is handled by the platform, I would rather keep the abstraction at that level, and lazy should be good enough.

pcarranza commented 6 years ago

Aaarg... scratch that, we need to watch the file for changes because a push from an external client can wipe the silence rules because we have 2 things updating the same file, and no locking or feedback mechanism.

You're right, we need to use fsnotify