newrelic / nri-postgresql

New Relic Infrastructure Postgresql Integration
MIT License
25 stars 31 forks source link

Misleading attribute names in PgBouncer metrics #83

Open SpiegelMadis opened 2 years ago

SpiegelMadis commented 2 years ago

I can not imagine a backwards compatible fix for this, but based on pgBouncer documentation maxwait field in pool stats contains values in seconds, however in nri-postgresql the field name is called maxwaitInMilliseconds.

https://github.com/newrelic/nri-postgresql/blob/1abf72c74923bfc8b98a35590c939821e327176c/src/metrics/pgbouncer_definitions.go#L50

From pgBouncer documentation:

maxwait: How long the first (oldest) client in the queue has waited, in seconds. If this starts increasing, then the current pool of servers does not handle requests quickly enough. The reason may be either an overloaded server or just too small of a pool_size setting.

As maxwait_us attribute is not used at the moment by New Relic, there is no way of monitoring sub-second wait time values for pgBouncer, although there is definitely a value in monitoring the trends.

In addition, there are other attribute names that are incorrectly named, as everything else pgBouncer related pointing to milliseconds actually contains microseconds, for example:

pgbouncer.stats.avgQueryDurationInMilliseconds From pgBouncer documentation:

avg_query_time: Average query duration, in microseconds.

These issues however can be worked around at the moment by dividing the values by 1000. If there is no good way of fixing this, at least it should be pointed out in documentation.

PS: While at it, it would be useful if avg_wait_time stats would be sent as a metric, it is useful for monitoring in order to detect problems with connection pooling issues.

gsanchezgavier commented 2 years ago

Hi @SpiegelMadis thanks for bringing this up. I labeled as bug and we will work on it as soon it gets prioritized. @mangulonr we could add the note to docs until the change the metric name is corrected WDYT?