prometheus-elector
brings the ability of running a leader election between multiple Prometheus instances running in a Kubernetes cluster. It translates into the following features:
The goal with prometheus-elector is to provide an easy to use, low maintenance, active-passive setup for Prometheus on Kubernetes.
What prometheus-elector is not:
Prometheus (in agent mode) can be used to push metrics to a remote storage backend like Mimir. While your storage backend might be highly available, you probably also want this property on the agent side as well.
One approach of this problem is to have multiple agents pushing the same set of metrics to the storage backend. This requires to run some sort of metrics deduplication on the storage backend side to ensure correctness.
Using prometheus-elector
, we can instead make sure that only one Prometheus instance has remote_write
enabled at any point of time and guarantee a reasonable delay (seconds) for another instance to take over when leading instance becomes unavailable. This brings the following advantages:
You can find the necessary configuration for this use case in the example directory
You need ko, kubectl
, k3d, docker
and helm
installed. You also need to make sure that prometheus-elector-registry.localhost
resolves to 127.0.0.1
by adding an entry in your /etc/hosts
.
You can then run make run_agent_example
.
This command:
storage
namespace), configured to received metrics using the remote_write
API.prometheus-elector
and prometheus
in agent mode. Only one of them will push metrics at any point of time.One issue running multiple Prometheus instances in paralle is that their dataset slightly diverges, which makes loadbalancing requests accross multiple instances difficult from a metrics consumer perspective. You'll need a metrics aware reverse proxy like promxy that aggregates the two sources to achieve this properly.
prometheus-elector takes a different approach and embeds a reverse proxy that forwards all received requests to the currently leading instance. While this solution doesn't provide load balancing, it allows, at minimal costs, to get consistent data independently of which replica is receiving the request initially.
You can find the necessary configuration for this use case in the example directory
You need ko, kubectl
, k3d, docker
and helm
installed. You also need to make sure that prometheus-elector-registry.localhost
resolves to 127.0.0.1
by adding an entry in your /etc/hosts
.
Then you can run make run_proxy_example
.
This command:
prometheus-elector
and prometheus
.From there you can port forward to one of the Prometheus pods (k port-forward service/prometheus-elector-dev-leader 9095:80
) and start hitting the API through the port 9095 of the pod.
It is implemented using a sidecar container that rewrites the configuration and injects remote_write
rules in the configuration when elected leader. The setup is very similar to the usual configmap-reloader sidecar in Kubernetes deployment.
The prometheus-elector container then run a Kubernetes leader election and an API server.
prometheus-elector accepts a configuration composed by two major sections:
follower
section indicates the Prometheus configuration to apply in follower mode. This configuration is always applied.leader
section indicates the changes to apply to the follower configuration when the instance is in elected leader. Please note that those changes gets "appended" to the follower configuration.Both those sections have the same model that the Prometheus configuration.
When a replica is elected leader, prometheus-elector generates a new configuration file that carries the follower configuration merged with the override values provided under the leader
section. And then tells Prometheus to reload its configuration using its lifecycle management API. If the replica is follower, only the follower section is generated, without the leader
overrides.
Here's an example that enables a remote_write
target only when leader.
# configuration applied when the instance is only follower.
follower:
scrape_configs:
- job_name: 'some job'
scrape_interval: 5s
static_configs:
- targets: ['localhost:8080']
# overrides to the follower configuration applied when the instance is leader.
leader:
remote_writes:
- url: http://remote.write.com
prometheus-elector can expose a reverse proxy that forwards all the received calls to the leading instance.
As it is implemented, it relies on a few assumptions:
member_id
of the replica is the pod
name.<pod_name>.<service_name>
domain name is resolvable via DNS. This is a property of statfulsets in Kubernetes, but it requires the cluster to have DNS support enabled.prometheus-elector also continuously monitors its local Prometheus instance to optimize its participation to the elader election to minimize downtime:
You can find an helm chart in this repository, as well as values for the HA agent example.
If the leader proxy is enabled, all HTTP calls received on the port 9095 are forwarded to the leader instance on port 9090 by default.
prometheus-elector
also exposes a few endpoints as well:
/_elector/healthz
: healthcheck endpoint/_elector/leader
: returns information about the state of the election./_elector/metrics
: Prometheus metrics endpoint. -api-listen-address string
HTTP listen address for the API. (default ":9095")
-api-proxy-enabled
Turn on leader proxy on the API
-api-proxy-prometheus-local-port uint
Listening port of the local prometheus instance (default 9090)
-api-proxy-prometheus-remote-port uint
Listening port of any remote prometheus instance (default 9090)
-api-proxy-prometheus-service-name string
Name of the statefulset headless service
-api-shutdown-grace-delay duration
Grace delay to apply when shutting down the API server (default 15s)
-config string
Path of the prometheus-elector configuration
-healthcheck-failure-threshold int
Amount of consecutives failures to consider Prometheus unhealthy (default 3)
-healthcheck-http-url string
URL to the Prometheus health endpoint
-healthcheck-period duration
Healthcheck period (default 5s)
-healthcheck-success-threshold int
Amount of consecutives success to consider Prometheus healthy (default 3)
-healthcheck-timeout duration
HTTP timeout for healthchecks (default 2s)
-init
Only init the prometheus config file
-kubeconfig string
Path to a kubeconfig. Only required if out-of-cluster.
-lease-duration duration
Duration of a lease, client wait the full duration of a lease before trying to take it over (default 15s)
-lease-name string
Name of lease resource
-lease-namespace string
Name of lease resource namespace
-lease-renew-deadline duration
Maximum duration spent trying to renew the lease (default 10s)
-lease-retry-period duration
Delay between two attempts of taking/renewing the lease (default 2s)
-notify-http-method string
HTTP method to use when sending the reload config request (default "POST")
-notify-http-url string
URL to the reload configuration endpoint
-notify-retry-delay duration
Delay between two notify retries. (default 10s)
-notify-retry-max-attempts int
How many retries for configuration update (default 5)
-notify-timeout duration
HTTP timeout for notify retries. (default 2s)
-output string
Path to write the Prometheus configuration
-readiness-http-url string
URL to the Prometheus ready endpoint
-readiness-poll-period duration
Poll period prometheus readiness check (default 5s)
-readiness-timeout duration
HTTP timeout for readiness calls (default 2s)
-runtime-metrics
Export go runtime metrics