prometheus-community / postgres_exporter

A PostgreSQL metric exporter for Prometheus
Apache License 2.0
2.81k stars 740 forks source link

pg_replication_lag is invalid for postgres newer than 9.3 #385

Open akamensky opened 4 years ago

akamensky commented 4 years ago

Releases after postgresql 9.3 had a change in logic where it would have no WAL updates while master host is idle, which would result in linear lag values increase when master is idle and use old query: SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp())) as lag

Appropriate query should be checking if we are on the latest know WAL position and if not then use this query, thus correct query is something like below: SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() THEN make_interval(0,0,0,0,0,0,0.0) ELSE (now() - pg_last_xact_replay_timestamp()) END AS replication_lag;

foozmeat commented 4 years ago

When I tried to use the recommended query I got a parsing error on the 00:00:00 result. This query worked better for me:

SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() THEN 0 ELSE EXTRACT (epoch from (now() - pg_last_xact_replay_timestamp())) END AS lag;

akamensky commented 4 years ago

FWIW, not sure about earlier versions but 10/11/12 you can get replication lag from primary node without any math in it. It is directly available there in microseconds value.

geekq commented 3 years ago

@akamensky You mean select * from pg_stat_replication; on the primary instance?

I have postgresql 12.5 here and implemented it as

pg_replication:
  query: "select extract(epoch from max(coalesce(replay_lag, make_interval(0,0,0,0,0,0,0.0)))) as lag_seconds from pg_stat_replication"
  master: true
  metrics:
    - lag_seconds:
        usage: "GAUGE"
        description: "Replication lag behind primary in seconds, as detected after a roundtrip on the primary"

Explanation

From postgres documentation https://www.postgresql.org/docs/12/monitoring-stats.html#PG-STAT-REPLICATION-VIEW

The pg_stat_replication view will contain one row per WAL sender process, showing statistics about replication to that sender's connected standby server. Only directly connected standbys are listed; no information is available about downstream standby servers.

The lag times reported in the pg_stat_replication view are measurements of the time taken for recent WAL to be written, flushed and replayed and for the sender to know about it. These times represent the commit delay that was (or would have been) introduced by each synchronous commit level, if the remote server was configured as a synchronous standby. For an asynchronous standby, the replay_lag column approximates the delay before recent transactions became visible to queries.

If the standby server has entirely caught up with the sending server and there is no more WAL activity, the most recently measured lag times will continue to be displayed for a short time and then show NULL.

  1. we check for replay_lag - interval of time for recent WAL to be written, flushed and replayed and for the sender to know about it
  2. if it is NULL, the sender has entirely caught up, use coalesce to set value to 0.0 seconds
  3. if there are multiple standbys, aggregate with max to get a single value
  4. if there are no connected standby servers, it returns NULL, interpreted by prometheus as NaN