lesovsky / pgscv

pgSCV is a multi-purpose monitoring agent and metrics exporter
BSD 3-Clause "New" or "Revised" License
165 stars 28 forks source link

Auto restart (reload) exporter service in case of fail #50

Closed glushakov closed 1 year ago

glushakov commented 1 year ago

Hi In case of postgres became unavailable, but then returned to normal, pgscv does not trying to reconnect

[05:23:29 postgres@server1:]:~$ sudo systemctl status pgscv
● pgscv.service - pgSCV is the Weaponry platform agent for PostgreSQL ecosystem
   Loaded: loaded (/etc/systemd/system/pgscv.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2023-03-07 15:28:22 MSK; 1h 55min ago
 Main PID: 1222929 (pgscv)
    Tasks: 8 (limit: 23652)
   Memory: 22.0M
   CGroup: /system.slice/pgscv.service
           └─1222929 /usr/bin/pgscv --config-file=/etc/pgscv/pgscv.yaml

Mar 07 15:28:22 server1 systemd[1]: Started pgSCV is the Weaponry platform agent for PostgreSQL ecosystem.
Mar 07 15:28:22 server1  pgscv[1222929]: {"level":"info","service":"pgscv","time":"2023-03-07T15:28:22+03:00","message":"read configuration from /etc/pgscv/pgscv.yaml"}
Mar 07 15:28:22 server1  pgscv[1222929]: {"level":"info","service":"pgscv","time":"2023-03-07T15:28:22+03:00","message":"no-track disabled, for details check the documentation about 'no_track_mode' option."}
Mar 07 15:28:22 server1  pgscv[1222929]: {"level":"info","service":"pgscv","time":"2023-03-07T15:28:22+03:00","message":"registered new service [system:0]"}
Mar 07 15:28:22 server1  pgscv[1222929]: {"level":"warn","service":"pgscv","time":"2023-03-07T15:28:22+03:00","message":"postgres://pgscv:password@server1/postgres?target_session_attrs=read-write&connect_timeout=2: failed to connect to `server...
Mar 07 15:28:22 server1  pgscv[1222929]: {"level":"info","service":"pgscv","time":"2023-03-07T15:28:22+03:00","message":"listen on http://0.0.0.0:9900"}

After restarting the service everything worked.

yaml: listen_address: 0.0.0.0:9900 services:

"server1": service_type: "postgres" conninfo: "postgres://pgscv:password@server1/postgres?target_session_attrs=read-write&connect_timeout=2"

glushakov commented 1 year ago

steps to reproduce:

  1. stop postgresql
  2. stop pgscv
  3. start pgscv
  4. start postgresql

I found this issue after OS reboot. Pgscv started earlier then PG

lesovsky commented 1 year ago

Unfortunately this is a design flaws from my side, which could not be fixed easily.

At startup, pgscv reads "services" from YAML and performs initial setup for each service. It includes one-time reading of some Postgres settings (e.g. block_size, wal_segment_size, data_directory, server_version, etc), all of them required later when collecting metrics. Hence when pgscv cannot connect to Postgres at startup, it skip adding service to its store and doesn't collect metrics later even if Postgres is started.

As a workaround I'd propose you to always run pgSCV after Postgres has been started. in case of systemd you can use After directive in Unit section.

glushakov commented 1 year ago

ok. thx!