Anomalies in the report of the metrics

mazzy89 commented 5 years ago

I was running pgbouncer as pods in my k8s cluster. 3 replicas. The metrics reported were very strange.

for instance the graph above reports the total query time before and after the scaling. The values are a bit weird when the replicas are 3. It reports as avg 16/17s which is a huge number and definitely not encountered in reality. When scaled to 1 then all values start to make a lot of sense.

Maybe it is not an issue of the exporter. But I'm not sure where to start to dig. Any thoughts?

mazzy89 commented 5 years ago

I just read the YAML and one option says:

# Extra labels to add to all metrics exported for this pgbouncer
# instance. Required if you have configured multiple pgbouncers,
# in order to export an unique set of metrics.
extra_labels:
  pool_id: 1

~should I maybe configure this when pgbouncer replicas are > 1?~

This is valid when running multiple pgbouncer instances pointing to different database instances

pracucci commented 5 years ago

May you share the query graphed in Grafana, please?

On Sat, Apr 27, 2019, 14:59 Salvatore Mazzarino notifications@github.com wrote:

I just read the YAML and one option says:

Extra labels to add to all metrics exported for this pgbouncer# instance. Required if you have configured multiple pgbouncers,# in order to export an unique set of metrics.extra_labels:

pool_id: 1

should I maybe configure this when pgbouncer replicas are > 1?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/spreaker/prometheus-pgbouncer-exporter/issues/12#issuecomment-487284198, or mute the thread https://github.com/notifications/unsubscribe-auth/AAM7QEEOBXDMSIA42USTP4LPSRE2XANCNFSM4HI4EFNQ .

mazzy89 commented 5 years ago

sort_desc( sum by (database) ( rate(pgbouncer_stats_queries_duration_microseconds{database=~"$db"}[1m]) ) )

pracucci commented 5 years ago

A couple of additional questions, please:

What do you want to graph exactly?
Can you also graph the number of running pgbouncer instances with count(pgbouncer_up) and re-share the graph to see the correlation between the number of pgbouncer instances and how the metric vary, please?

mazzy89 commented 5 years ago

I'm using this dashboard https://grafana.com/dashboards/9760

mazzy89 commented 5 years ago

the fact is that even if I'm running three instances I see just one in the graph

the graph collects the instance of pgbouncer-exporter

mazzy89 commented 5 years ago

This shows how the metric changed when scaling from 1 to 3 pods.

pracucci commented 5 years ago

I suppose I got it. I believe your Prometheus job (scraping) is not configured correctly. My guess is that Prometheus is scraping metrics from the 3 pgbouncer exporter instances without (in the relabelling) any differentiating label (ie. instance IP) in order to have different time series for each pgbouncer instance.

On Sat, Apr 27, 2019, 16:28 Salvatore Mazzarino notifications@github.com wrote:

[image: image] https://user-images.githubusercontent.com/5463218/56850928-5862c180-6909-11e9-8e78-5a812f68d988.png

This shows how the metric changed when scaling from 1 to 3 pods.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/spreaker/prometheus-pgbouncer-exporter/issues/12#issuecomment-487290546, or mute the thread https://github.com/notifications/unsubscribe-auth/AAM7QEAVTLJ4256B53DAJO3PSRPHTANCNFSM4HI4EFNQ .

mazzy89 commented 5 years ago

A small note here. I don't run three pgbouncer exporter instances. I run 1 pgbouncer exporter instance and three different instances of pgbouncer.

these are the labels from the pgbouncer_up

pgbouncer_up{endpoint="metrics",instance="100.96.4.126:9100",job="prometheus-pgbouncer-exporter",namespace="monitoring",pod="prometheus-pgbouncer-exporter-df6fcd4b7-ks88j",service="prometheus-pgbouncer-exporter"}

pracucci commented 5 years ago

Not a small note :) then make sure to add a distinctive label to each pgbouncer. If in doubts, please share your current exporter config.

On Sat, Apr 27, 2019, 17:21 Salvatore Mazzarino notifications@github.com wrote:

A small note here. I don't run three pgbouncer exporter instances. I run 1 pgbouncer exporter instance and three different instances of pgbouncer.

these are the labels from the pgbouncer_up

pgbouncer_up{endpoint="metrics",instance="100.96.4.126:9100",job="prometheus-pgbouncer-exporter",namespace="monitoring",pod="prometheus-pgbouncer-exporter-df6fcd4b7-ks88j",service="prometheus-pgbouncer-exporter"}

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/spreaker/prometheus-pgbouncer-exporter/issues/12#issuecomment-487294683, or mute the thread https://github.com/notifications/unsubscribe-auth/AAM7QEEPKXY4HWNF6PTMP4DPSRVQFANCNFSM4HI4EFNQ .

mazzy89 commented 5 years ago

Thanks a lot Marco for the support. I'll play with that. I'll tune the config of the Prometheus scraper. It must be that the reason for sure. I'll keep you updated.

pracucci commented 5 years ago

Let me just share the last hint. What you need to get is distinctive metrics (unique labels set) for each pgbouncer instance.

Usually you would run 1 exporter for each pgbouncer instance (suggested) and the Prometheus service discovery + your job relabelling will apply unique labels (ie. Instance IP).

If you really want to run 1 exporter for all pgbouncer instances (not suggested), then you need 1 entry in the exporter config for every single pgbouncer instance and you need to add a distinctive label to each of them (via the extra_labels).

The reason why the exporter supports multiple instances in the config is to cover the case you have multiple pgbouncer instances running on the same host. If you run multiple pgbouncer instances on different hosts, then I suggest you to run the exporter on each host, and each exporter will exporter metrics from the pgbouncer running on the host itself.

On Sat, Apr 27, 2019, 18:09 Salvatore Mazzarino notifications@github.com wrote:

Thanks a lot Marco for the support. I'll play with that. I'll tune the config of thPrometheus It must be that the reason for sure. I'll keep you updated.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/spreaker/prometheus-pgbouncer-exporter/issues/12#issuecomment-487298372, or mute the thread https://github.com/notifications/unsubscribe-auth/AAM7QEAVA2BWYOXY3M5RTGTPSR3C3ANCNFSM4HI4EFNQ .

mazzy89 commented 5 years ago

Thanks for your hint. this clarifies a bit the scenario. My initial assumption (I haven't look at the source code though) was that an exporter could retrieve metrics from pgbouncer regardless of the instances. Let me explain myself. The way how pgbouncer run in k8s in my setup is behind a kubernetes Service object which proxy the traffic of all instances (pods) of pgbouncer. So the exporter connects to pgbouncer through an endpoint like sql-database.namespace.svc.cluster.local which balances all the instances of pgbouncer.

mazzy89 commented 5 years ago

The quick thing which would resolve this imho elegantly this issue is to run the exporter as a sidecar container of each pgbouncer instance and pass as extra_label the container name

pracucci commented 5 years ago

You should definitely run it as a sidecar container but there's no need to use extra_labels for your setup. Just make sure the K8S service discovery job has a relabelling to set the pod IP or any other unique (to the pod) label and you're done.

On Sat, Apr 27, 2019, 19:13 Salvatore Mazzarino notifications@github.com wrote:

The quick thing which would resolve this imho elegantly this issue is to run the exporter as a sidecar container of each pgbouncer instance and pass as extra_label the container name

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/spreaker/prometheus-pgbouncer-exporter/issues/12#issuecomment-487303794, or mute the thread https://github.com/notifications/unsubscribe-auth/AAM7QEH5FJKL4TJL7U3QJKDPSSCVDANCNFSM4HI4EFNQ .

mazzy89 commented 5 years ago

Solved. I've deployed each pgbouncer server as pod along with the exporter as sidecard container. so the scaling and the collection of metrics is easier. now the metrics are collected and reported correctly and they make sense. thank you Marco.

spreaker / prometheus-pgbouncer-exporter

Anomalies in the report of the metrics #12

Extra labels to add to all metrics exported for this pgbouncer# instance. Required if you have configured multiple pgbouncers,# in order to export an unique set of metrics.extra_labels: