jacksontj / promxy

An aggregating proxy to enable HA prometheus
MIT License
1.12k stars 126 forks source link

A little more elaboration around alerts? #607

Closed petemounce closed 10 months ago

petemounce commented 11 months ago

The readme covers alerting rules, but I have a few questions remaining.

Context: I'm transitioning from one monolithic observability instance (no fancy kubernetes clustering things) that runs prometheus, alertmanager, grafana, promlens to a setup that runs prometheis separately from the (alertmanager/grafana/promlens/promxy) instances.

I'm doing this for all the reasons eloquently expressed in MOTIVATION.md

(I really liked that doc; I feel the shared suffering of a similar journey and I appreciate the problem-focused story being told)

So, that works great. Thanks very much! Now, my next thing is alerts.

How do I use alerting/recording rules in promxy? Promxy is simply an aggregating proxy in front of your prometheus infrastructure. As such, you can use promxy to create alerting/recording rules which will execute across your entire prometheus infrastructure. For example, if you wanted to know that the global error rate was <10% this would be impossible on the individual prometheus hosts (without federation, or re-scraping) but trivial in promxy. Note: recording rules in regular prometheus write to their local tsdb. Promxy has no local tsdb, so if you wish to use recording rules (or see the metrics from alerting rules) a remote_write endpoint must be defined in the promxy config (which is where it will send those metrics).

I get from this prose that when I drop alerting rules into the promxy, those will be evaluated across global data.

  1. Do I still need to also host alerting rules into individual prometheis' filesystems? ... and have those send their triggers to alertmanagers?
  2. I'm setting up grafana datasources for alertmanager, so that I can have the really-quite-nice-imo grafana unified alerting UX.
    • since grafana has no service-discovery for datasources, am I now limited inside grafana to just the alerting rules that promxy has? ... or is there some clever way I can have the local-to-prometheis alerting rules too? (I'm not sure yet whether I want to - all my alerts are currently global, so it's a moot point - but I'm the curious sort. It also occurred to me that if the alerts are promxy-only then that's another part of the HA mix I need to consider over and above the HA-pairs-sharded-by-AZ prometheis)
jacksontj commented 11 months ago

Do I still need to also host alerting rules into individual prometheis' filesystems?

You can but you don't have to. As you surmised promxy will run the query against the full dataset. So any alert you would put on prometheus directly you could put on promxy. If you are running a larger system it may be beneficial (from a cost / performance perspective) to put some alerts on the prometheus node -- but that can make the system more complicated.

... and have those send their triggers to alertmanagers?

If you have some alerting rules on prometheus -- still configure them to send to alertmanager, if not there is no need.

am I now limited inside grafana to just the alerting rules that promxy has?

I haven't used the alertmanager datasource (looks interesting); but given that it is talking to alertmanager (not prometheus) regardless of where your alert rules are processed -- they can all go to the same alertmanager cluster.

jacksontj commented 10 months ago

Since there hasn't been a response in a couple of weeks I'm going to go ahead and close this issue out (assuming its resolved). If you have any further questions feel free to re-open or create a new issue!