lesovsky / zabbix-extensions

Zabbix additional monitoring modules
BSD 3-Clause "New" or "Revised" License
316 stars 230 forks source link

postgresql template pgsql.streaming.lag.seconds #60

Open oleg9301 opened 5 years ago

oleg9301 commented 5 years ago

After postgresql v10 we don`t need to connect to standby server for calculating pgsql.streaming.lag.seconds

in postgresql.conf UserParameter=pgsql.streaming.lag.seconds[*],psql -qAtX $1 -c "select coalesce(extract(epoch from replay_lag), 0) from pg_stat_replication where client_addr = '$2'"

in zabbix template <key>pgsql.streaming.lag.seconds[{$PG_CONNINFO},{#HOTSTANDBY}]</key>

What`s profitable: if you use any HA software for postgresql(patroni), you can't say which server will be standby, and you need a template, where no difference for monitoring master or slave.

Next problem is pgsql.wal.write on slave postgresql(ERROR: recovery is in progress for function pg_current_wal_lsn), I rewrite it to next: UserParameter=pgsql.wal.write[*],if [ "$(psql -qAtX $1 -c 'select pg_is_in_recovery()')" = "f" ]; then psql -qAtX $1 -c "select pg_wal_lsn_diff(pg_current_wal_lsn(),'0/00000000')"; else echo "0"; fi

stephankn commented 5 years ago

I have no replication setup, so I can not try this. Simplifying the setup is in general a good idea, but we have to be compatibe with older versions of PostgreSQL as well. So this line probably needs to be executed conditional. See other lines on how to match against postgres version.

I am not 100% sure what exactly your second problem is. But with the two calls not being atomic, you certainly have a race condition here, as this state can change before you execute the second query. I suggest to put it in a single query, maybe using a CASE. What is the expected result in case pg_is_in_recovery() returns true? Should it be 0?

oleg9301 commented 5 years ago

The second problem is next replica error(after change state master -> slave):

psql -qAtX $1 -c "select pg_wal_lsn_diff(pg_current_wal_lsn(),'0/00000000')"
ERROR:  recovery is in progress
HINT:  WAL control functions cannot be executed during recovery.
EamonZhang commented 4 years ago
select case when pg_is_in_recovery() then 0 else pg_wal_lsn_diff(pg_current_wal_lsn(),'0/00000000') end;
stephankn commented 4 years ago

@oleg9301 the query suggested by @EamonZhang looks fine for me. Can you please confirm that it fixes your issue?