prometheus-community / postgres_exporter

A PostgreSQL metric exporter for Prometheus
Apache License 2.0
2.72k stars 723 forks source link

respond with non-200 (path "/") when errors were returned from database. #615

Open mike2194 opened 2 years ago

mike2194 commented 2 years ago

Proposal

Respond to clients with non-200 at root path ("/", or "/healthz") when errors are returned from database due to login.

Background

Postgres_exporter doesn't really have a healthcheck path to be used in liveness,readiness checks; simply the fact that the webserver is running is good enough for readiness, but not liveness.

Use Case

For users incorporating Vault for database authentication, credentials may be revoked at any time. When this happens, or if credentials are renewed, the postgres_exporter currently has no way of indicating this to clients.

sysadmind commented 2 years ago

Typically prometheus exporters export a metric to determine if scrapes work or fail. In this case it is pg_up which will be 1 for success or 0 for failure.

From your description it sounds like you are running the postgres_exporter using credentials created in postgres by vault and these credentials may have a limited lifetime. I'm a bit confused about what you are trying to achieve with the healthcheck though. Are you trying to have kubernetes restart the pod when the login fails? It sounds like there may already some custom work to inject the credentials into the exporter pod from vault. If there is, I would suggest adding a health check through those modifications. Kubernetes health checks can execute scripts which can load the metrics and determine if the pg_up metric has the desired value.

In terms of implementing a health check in the exporter itself, I would be hesitant as there are many scenarios to account for and it could be hard for users to configure the health check to work in the way they would expect. For example, the exporter supports querying many postgres instances and there may be an issue with only one connection.

mike2194 commented 2 years ago

the issue is that DATA_SOURCE_PASS_FILE (or USER_FILE) content is not read on change, leaving the old credentials active and resulting in failed queries (permission denied). vault injects secrets by mounting a volume. i assume the connection to the database is left open, however, preventing postgres_exporter from realizing creds are (now) invalid. I believe this can be worked around, if postgres_exporter handled an interrupt signal.

please correct me if i'm wrong about any above assumptions related to how exporter (and by extension golang) handles credential info.

thus, i'm suggesting 2 possible solutions:

  1. postgres_exporter should handle connection details changes, either by restarting connection to postgres, or the pod killing itself
  2. healthcheck endpoint could be added to indicate error during external query processing.
smark88 commented 2 years ago

Having an optional liveness check would be ideal, could be a flag in the helm chart. As we are hitting this same issue.

drewwells commented 10 months ago

Can we get an update on this? Exporter only does one check for valid creds at the start of operation, then just logs errors without taking any preventative action like crashing or reporting status on a liveness port as described above.

Rotating credentials is an critical feature that should not be overlooked for edge cases like "connecting to many databases". I should be able to connect to one database and have an external way to be notified that the one connection is unavailable.

https://github.com/prometheus-community/postgres_exporter/blob/68c176b8833b7580bf847cecf60f8e0ad5923f9a/cmd/postgres_exporter/main.go#L89-L93

~On the topic, env variables are in a limited way to provide credentials. If we could read from a volume mount with hot reload, then rotating credentials is possible without crashing. I will take crashing the pod to remedy this issue though.~

These env variables direct access to files. This may be useful if they continuously re-read the contents.

DATA_SOURCE_URI_FILE
DATA_SOURCE_USER_FILE
DATA_SOURCE_PASS_FILE