Open mike2194 opened 2 years ago
Typically prometheus exporters export a metric to determine if scrapes work or fail. In this case it is pg_up
which will be 1 for success or 0 for failure.
From your description it sounds like you are running the postgres_exporter using credentials created in postgres by vault and these credentials may have a limited lifetime. I'm a bit confused about what you are trying to achieve with the healthcheck though. Are you trying to have kubernetes restart the pod when the login fails? It sounds like there may already some custom work to inject the credentials into the exporter pod from vault. If there is, I would suggest adding a health check through those modifications. Kubernetes health checks can execute scripts which can load the metrics and determine if the pg_up
metric has the desired value.
In terms of implementing a health check in the exporter itself, I would be hesitant as there are many scenarios to account for and it could be hard for users to configure the health check to work in the way they would expect. For example, the exporter supports querying many postgres instances and there may be an issue with only one connection.
the issue is that DATA_SOURCE_PASS_FILE (or USER_FILE) content is not read on change, leaving the old credentials active and resulting in failed queries (permission denied). vault injects secrets by mounting a volume. i assume the connection to the database is left open, however, preventing postgres_exporter from realizing creds are (now) invalid. I believe this can be worked around, if postgres_exporter handled an interrupt signal.
please correct me if i'm wrong about any above assumptions related to how exporter (and by extension golang) handles credential info.
thus, i'm suggesting 2 possible solutions:
Having an optional liveness check would be ideal, could be a flag in the helm chart. As we are hitting this same issue.
Can we get an update on this? Exporter only does one check for valid creds at the start of operation, then just logs errors without taking any preventative action like crashing or reporting status on a liveness port as described above.
Rotating credentials is an critical feature that should not be overlooked for edge cases like "connecting to many databases". I should be able to connect to one database and have an external way to be notified that the one connection is unavailable.
~On the topic, env variables are in a limited way to provide credentials. If we could read from a volume mount with hot reload, then rotating credentials is possible without crashing. I will take crashing the pod to remedy this issue though.~
These env variables direct access to files. This may be useful if they continuously re-read the contents.
DATA_SOURCE_URI_FILE
DATA_SOURCE_USER_FILE
DATA_SOURCE_PASS_FILE
Proposal
Respond to clients with non-200 at root path ("/", or "/healthz") when errors are returned from database due to login.
Background
Postgres_exporter doesn't really have a healthcheck path to be used in liveness,readiness checks; simply the fact that the webserver is running is good enough for readiness, but not liveness.
Use Case
For users incorporating Vault for database authentication, credentials may be revoked at any time. When this happens, or if credentials are renewed, the postgres_exporter currently has no way of indicating this to clients.