zalando / postgres-operator

Postgres operator creates and manages PostgreSQL clusters running in Kubernetes
https://postgres-operator.readthedocs.io/
MIT License
4.29k stars 974 forks source link

Expose operator metrics (prometheus) #1189

Open mseiwald opened 3 years ago

mseiwald commented 3 years ago

It would be great to have a prometheus metrics endpoint available in postgres-operator to be able add monitoring at the operator level (not the DBs themselves). An example would be when the operator fails to sync a DB for whatever reason. Currently I don't see a way to be notified about these events except parsing the operator's logs.

FxKu commented 3 years ago

I think there's a tool called postgres-exporter (maybe this one) which some use together with the operator by defining a sidecar.

mseiwald commented 3 years ago

@FxKu From what I understand postgres-exporter is for Database (PostgreSQL) metrics. I was talking about operator metrics (e.g. failed reconciliations etc.).

Jan-M commented 3 years ago

I think most relevant data is actually exposed on target objects and mainfest status, combine this with watching for errors.

We will look into the log reporting and levels used, to make errors really errors, which is not totally easy to decide given our resync and transient problems.

We also currently do not use or plan to use prometheus to monitor the operator so that would need to come as a contribution.

Yannig commented 3 years ago

@mseiwald I have created a little PR with a Prometheus endpoint to check PG cluster sync status.

https://github.com/zalando/postgres-operator/pull/1529

owenthereal commented 1 year ago

@FxKu, Could you give some insight on how we could get operator metrics (e.g. failed reconciliations etc.)? My use case - I want to get the metrics of whether clusters are successfully provisioned and what states they are in.