Open gordonjb opened 3 years ago
The monitor crashed over the weekend.
I can run it manually by running /usr/pgsql-12/bin/pg_autoctl run --pgdata ~/monitor
from the postgres user, but trying to run the monitor service fails every time, leaving an orphaned semaphore I had to clean up. I edited the service to run pg_autoctl with -vvv
, and got the following:
Sep 20 17:21:40 hostname-1 systemd[1]: Started pg_auto_failover monitor process.
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 TRACE config.c:166 SetConfigFilePath: "/var/lib/pgsql/.config/pg_autoctl/var/lib/pgsql/monitor/pg_autoctl.cfg"
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 TRACE config.c:194 SetStateFilePath: "/var/lib/pgsql/.local/share/pg_autoctl/var/lib/pgsql/monitor/pg_autoctl.state"
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 TRACE config.c:209 SetKeeperStateFilePath: "/var/lib/pgsql/.local/share/pg_autoctl/var/lib/pgsql/monitor/pg_autoctl.init"
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 TRACE config.c:237 SetNodesFilePath: "/var/lib/pgsql/.local/share/pg_autoctl/var/lib/pgsql/monitor/nodes.json"
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 TRACE config.c:263 SetPidFilePath: "/tmp/pg_autoctl/var/lib/pgsql/monitor/pg_autoctl.pid"
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 TRACE config.c:166 SetConfigFilePath: "/var/lib/pgsql/.config/pg_autoctl/var/lib/pgsql/monitor/pg_autoctl.cfg"
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 TRACE config.c:194 SetStateFilePath: "/var/lib/pgsql/.local/share/pg_autoctl/var/lib/pgsql/monitor/pg_autoctl.state"
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 TRACE config.c:209 SetKeeperStateFilePath: "/var/lib/pgsql/.local/share/pg_autoctl/var/lib/pgsql/monitor/pg_autoctl.init"
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 TRACE config.c:237 SetNodesFilePath: "/var/lib/pgsql/.local/share/pg_autoctl/var/lib/pgsql/monitor/nodes.json"
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 TRACE config.c:263 SetPidFilePath: "/tmp/pg_autoctl/var/lib/pgsql/monitor/pg_autoctl.pid"
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 DEBUG config.c:287 Probing configuration file "/var/lib/pgsql/.config/pg_autoctl/var/lib/pgsql/monitor/pg_autoctl.cfg"
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 TRACE ini_file.c:131 pg_autoctl.role = monitor
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 DEBUG config.c:320 ProbeConfigurationFileRole: monitor
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 TRACE config.c:166 SetConfigFilePath: "/var/lib/pgsql/.config/pg_autoctl/var/lib/pgsql/monitor/pg_autoctl.cfg"
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 TRACE config.c:194 SetStateFilePath: "/var/lib/pgsql/.local/share/pg_autoctl/var/lib/pgsql/monitor/pg_autoctl.state"
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 TRACE config.c:209 SetKeeperStateFilePath: "/var/lib/pgsql/.local/share/pg_autoctl/var/lib/pgsql/monitor/pg_autoctl.init"
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 TRACE config.c:263 SetPidFilePath: "/tmp/pg_autoctl/var/lib/pgsql/monitor/pg_autoctl.pid"
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 DEBUG monitor_config.c:256 Reading configuration from /var/lib/pgsql/.config/pg_autoctl/var/lib/pgsql/monitor/pg_autoctl.cfg
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 TRACE ini_file.c:131 pg_autoctl.role = monitor
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 TRACE ini_file.c:131 pg_autoctl.hostname = hostname-1.fqdn
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 TRACE ini_file.c:131 postgresql.pgdata = /var/lib/pgsql/monitor
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 TRACE ini_file.c:131 postgresql.pg_ctl = /usr/pgsql-12/bin/pg_ctl
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 TRACE ini_file.c:131 postgresql.username = autoctl_node
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 TRACE ini_file.c:131 postgresql.dbname = pg_auto_failover
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 TRACE ini_file.c:131 postgresql.port = 5433
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 TRACE ini_file.c:131 postgresql.listen_addresses = *
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 TRACE ini_file.c:131 postgresql.auth_method = trust
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 TRACE ini_file.c:131 ssl.sslmode = prefer
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 TRACE ini_file.c:131 ssl.active = 0
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 DEBUG pgsetup.c:122 pg_setup_init: /usr/pgsql-12/bin/pg_ctl version 12.4
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 TRACE pgsetup.c:454 Failed to open file "/var/lib/pgsql/monitor/postmaster.pid": No such file or directory
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 TRACE pgsetup.c:454 Failed to open file "/var/lib/pgsql/monitor/postmaster.pid": No such file or directory
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 DEBUG pgctl.c:212 /usr/pgsql-12/bin/pg_controldata /var/lib/pgsql/monitor
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 DEBUG pgsetup.c:392 Found PostgreSQL system 6882784943118605993 at "/var/lib/pgsql/monitor", version 1201, catalog version 201909212
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 TRACE primary_standby.c:161 local_postgres_set_status_path: /var/lib/pgsql/monitor
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 TRACE primary_standby.c:175 local_postgres_set_status_path: /tmp/pg_autoctl/var/lib/pgsql/monitor/pg_autoctl.pg
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 TRACE primary_standby.c:198 local_postgres_unlink_status_file: /tmp/pg_autoctl/var/lib/pgsql/monitor/pg_autoctl.pg
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 TRACE supervisor.c:452 supervisor_init
Sep 20 17:21:40 hostname-1 pg_autoctl[5302]: 17:21:40 5302 TRACE signals.c:37 set_signal_handlers
Sep 20 17:21:40 hostname-1 systemd[1]: pgautofailover-monitor.service: main process exited, code=dumped, status=11/SEGV
Sep 20 17:21:40 hostname-1 systemd[1]: Unit pgautofailover-monitor.service entered failed state.
Sep 20 17:21:40 hostname-1 systemd[1]: pgautofailover-monitor.service failed.
EDIT: The monitor was running fine when run manually. After comparing logs for these two, I removed a pid file from /tmp/pg_autoctl/var/lib/pgsql/monitor/pg_autoctl.pid
, and then the service started as expected. My original issue still stands however.
Hi, I mainly want to check that I've a) followed the update procedure correctly and b) figure out how to proceed to get the cluster up again.
On following the instructions for upgrading from the docs, my primary node will not start. After restarting the monitor, I checked the status as suggested:
The steps I ran were:
yum remove pg-auto-failover14_12.x86_64
on hostname-1 and hostname-2yum install pg-auto-failover16_12-1.6.2-1.el7.x86_64.rpm
on hostname-1 and hostname-2systemctl restart pgautofailover-monitor.service
on hostname-1 (on this development db, the monitor is also hosted on hostname-1)As I understand the instructions, this should be sufficient to upgrade pg_auto_failover. However, as seen in the above state, node_1 seems to be stuck. I have pasted logs from node_1 starting at the time the upgrade steps began below:
This repeats until (I assume) the 1.6 rpm finished installling:
This also repeats a few times, until the point at which the monitor service is restarted:
And then node_1 has been stuck here ever since. Restarting pgautofailover service results in the same loop:
I have full logs from the monitor, node_1 and node_2 if required.