jlevesy / prometheus-elector

Leader election for prometheus
25 stars 2 forks source link
high-availability kubernetes prometheus

prometheus-elector

prometheus-elector brings the ability of running a leader election between multiple Prometheus instances running in a Kubernetes cluster. It translates into the following features:

The goal with prometheus-elector is to provide an easy to use, low maintenance, active-passive setup for Prometheus on Kubernetes.

What prometheus-elector is not:

Use Case: Active Passive Prometheus Agent Setup

Prometheus (in agent mode) can be used to push metrics to a remote storage backend like Mimir. While your storage backend might be highly available, you probably also want this property on the agent side as well.

One approach of this problem is to have multiple agents pushing the same set of metrics to the storage backend. This requires to run some sort of metrics deduplication on the storage backend side to ensure correctness.

Using prometheus-elector, we can instead make sure that only one Prometheus instance has remote_write enabled at any point of time and guarantee a reasonable delay (seconds) for another instance to take over when leading instance becomes unavailable. This brings the following advantages:

illustration

You can find the necessary configuration for this use case in the example directory

Running an Example of this Setup

You need ko, kubectl, k3d, docker and helm installed. You also need to make sure that prometheus-elector-registry.localhost resolves to 127.0.0.1 by adding an entry in your /etc/hosts.

You can then run make run_agent_example.

This command:

Use Case: Active Passive Prometheus

One issue running multiple Prometheus instances in paralle is that their dataset slightly diverges, which makes loadbalancing requests accross multiple instances difficult from a metrics consumer perspective. You'll need a metrics aware reverse proxy like promxy that aggregates the two sources to achieve this properly.

prometheus-elector takes a different approach and embeds a reverse proxy that forwards all received requests to the currently leading instance. While this solution doesn't provide load balancing, it allows, at minimal costs, to get consistent data independently of which replica is receiving the request initially.

illustration

You can find the necessary configuration for this use case in the example directory

Running an Example of this setup

You need ko, kubectl, k3d, docker and helm installed. You also need to make sure that prometheus-elector-registry.localhost resolves to 127.0.0.1 by adding an entry in your /etc/hosts.

Then you can run make run_proxy_example.

This command:

From there you can port forward to one of the Prometheus pods (k port-forward service/prometheus-elector-dev-leader 9095:80) and start hitting the API through the port 9095 of the pod.

How it Works?

It is implemented using a sidecar container that rewrites the configuration and injects remote_write rules in the configuration when elected leader. The setup is very similar to the usual configmap-reloader sidecar in Kubernetes deployment.

The prometheus-elector container then run a Kubernetes leader election and an API server.

Election Aware Configuration

prometheus-elector accepts a configuration composed by two major sections:

Both those sections have the same model that the Prometheus configuration.

When a replica is elected leader, prometheus-elector generates a new configuration file that carries the follower configuration merged with the override values provided under the leader section. And then tells Prometheus to reload its configuration using its lifecycle management API. If the replica is follower, only the follower section is generated, without the leader overrides.

Here's an example that enables a remote_write target only when leader.

# configuration applied when the instance is only follower.
follower:
  scrape_configs:
  - job_name:       'some job'
    scrape_interval: 5s
    static_configs:
    - targets: ['localhost:8080']

# overrides to the follower configuration applied when the instance is leader.
leader:
  remote_writes:
    - url: http://remote.write.com

Election Aware Proxy

prometheus-elector can expose a reverse proxy that forwards all the received calls to the leading instance.

As it is implemented, it relies on a few assumptions:

Monitoring the Local Prometheus

prometheus-elector also continuously monitors its local Prometheus instance to optimize its participation to the elader election to minimize downtime:

Installing Prometheus Elector

You can find an helm chart in this repository, as well as values for the HA agent example.

API Reference

If the leader proxy is enabled, all HTTP calls received on the port 9095 are forwarded to the leader instance on port 9090 by default.

prometheus-elector also exposes a few endpoints as well:

Configuration Reference

  -api-listen-address string
        HTTP listen address for the API. (default ":9095")
  -api-proxy-enabled
        Turn on leader proxy on the API
  -api-proxy-prometheus-local-port uint
        Listening port of the local prometheus instance (default 9090)
  -api-proxy-prometheus-remote-port uint
        Listening port of any remote prometheus instance (default 9090)
  -api-proxy-prometheus-service-name string
        Name of the statefulset headless service
  -api-shutdown-grace-delay duration
        Grace delay to apply when shutting down the API server (default 15s)
  -config string
        Path of the prometheus-elector configuration
  -healthcheck-failure-threshold int
        Amount of consecutives failures to consider Prometheus unhealthy (default 3)
  -healthcheck-http-url string
        URL to the Prometheus health endpoint
  -healthcheck-period duration
        Healthcheck period (default 5s)
  -healthcheck-success-threshold int
        Amount of consecutives success to consider Prometheus healthy (default 3)
  -healthcheck-timeout duration
        HTTP timeout for healthchecks (default 2s)
  -init
        Only init the prometheus config file
  -kubeconfig string
        Path to a kubeconfig. Only required if out-of-cluster.
  -lease-duration duration
        Duration of a lease, client wait the full duration of a lease before trying to take it over (default 15s)
  -lease-name string
        Name of lease resource
  -lease-namespace string
        Name of lease resource namespace
  -lease-renew-deadline duration
        Maximum duration spent trying to renew the lease (default 10s)
  -lease-retry-period duration
        Delay between two attempts of taking/renewing the lease (default 2s)
  -notify-http-method string
        HTTP method to use when sending the reload config request (default "POST")
  -notify-http-url string
        URL to the reload configuration endpoint
  -notify-retry-delay duration
        Delay between two notify retries. (default 10s)
  -notify-retry-max-attempts int
        How many retries for configuration update (default 5)
  -notify-timeout duration
        HTTP timeout for notify retries. (default 2s)
  -output string
        Path to write the Prometheus configuration
  -readiness-http-url string
        URL to the Prometheus ready endpoint
  -readiness-poll-period duration
        Poll period prometheus readiness check (default 5s)
  -readiness-timeout duration
        HTTP timeout for readiness calls (default 2s)
  -runtime-metrics
        Export go runtime metrics