Perform “pg_autoctl drop node --destroy” on the secondary server.
following test log, pg_autoctl drop node fails. And the FSM of the secondary server is promoted from secondary to single.
Is this an expected result? Or is it a bug?
### secondary server log
$ pg_autoctl drop node --destroy
17:57:43 66464 INFO Removing node with name "test2" in formation "test" from the monitor
-17:57:43 66464 INFO An instance of pg_autoctl is running with PID 11199, waiting for it to stop.
-17:58:13 66464 INFO Sending signal SIGQUIT to pg_autoctl process 11199
-17:58:13 66464 INFO An instance of pg_autoctl is running with PID 11199, waiting for it to stop.
17:58:14 66464 INFO The pg_autoctl instance with pid 11199 has now terminated.
17:58:14 66464 WARN PG_REGRESS_SOCK_DIR is set to "~/run/postgresql", and our setup is using "pgsql-002"
-17:58:14 66464 FATAL Failed to ensure that the local node with id 5229 in formation "test" and group 0 has been removed from the monitor
#### pg_autoctl log
+17:57:43 11204 INFO Monitor assigned new state "single"
+17:57:43 11204 INFO FSM transition from "secondary" to "single": Primary was forcibly removed
-17:57:44 11204 INFO Promoting postgres
17:57:44 11204 INFO Waiting for postgres to promote
17:57:45 11204 INFO Cleaning-up Postgres replication settings
17:57:45 11204 INFO Disabling synchronous replication
17:57:45 11204 INFO Reloading Postgres configuration and HBA rules
17:57:45 11204 INFO Transition complete: current state is now "single"
17:58:13 11199 INFO pg_autoctl received signal SIGQUIT, terminating
17:58:13 11203 INFO Postgres controller service received signal SIGQUIT, terminating
17:58:13 11203 INFO Stopping pg_autoctl postgres service
17:58:13 11203 INFO /bin/pg_ctl --pgdata /data/db/pgcluster --wait stop --mode fast
17:58:13 11199 INFO Waiting for subprocesses to terminate.
+17:58:14 11199 INFO Stop pg_autoctl
pg_autoctl show state
Name | Node | Host:Port | TLI: LSN | Connection | Reported State | Assigned State
----------+-------+------------------------+----------------+--------------+---------------------+--------------------
test1 | 5228 | pgsql-001:5432 | 1: 0/5002270 | read-write | single | single
test2 | 5229 | pgsql-002:5432 | 2: 0/6001D60 | read-write | single | single
### monitor server log
-- vi pg_autoctl.log
17:57:43 157088 INFO Setting goal state of node 5229 "test2" (pgsql-002:5432) from formation "test" and group 0 to "dropped" to implement node removal.
17:57:43 157088 INFO New state for node 5229 "test2" (pgsql-002:5432): secondary ➜ dropped
17:57:43 157088 INFO Setting number_sync_standbys to 0 for formation "test" now that we have 0 standby nodes set with replication-quorum.
17:57:43 157088 INFO Setting goal state of node 5228 "test1" (pgsql-001:5432) to single as there is no other node.
17:57:43 157088 INFO New state for node 5228 "test1" (pgsql-001:5432): primary ➜ single
17:57:43 157088 INFO Setting goal state of node 5229 "test2" (pgsql-002:5432) to single as there is no other node.
17:57:43 157088 INFO New state for node 5229 "test2" (pgsql-002:5432): secondary ➜ single
17:57:44 157088 INFO New state is reported by node 5228 "test1" (pgsql-001:5432): "single"
17:57:44 157088 INFO New state for node 5228 "test1" (pgsql-001:5432): single ➜ single
17:57:45 157088 INFO New state is reported by node 5229 "test2" (pgsql-002:5432): "single"
17:57:45 157088 INFO New state for node 5229 "test2" (pgsql-002:5432): single ➜ single
-- vi postgresql-2023-12-24_000000.log
[2023-12-24 17:57:43 KST]-54992 -autoctl_node@pgsql-002(58480):pg_auto_failover-[unknown] LOG: Setting goal state of node 5229 "test2" (pgsql-002:5432) from formation "test" and group 0 to "dropped" to implement node removal.
[2023-12-24 17:57:43 KST]-54992 -autoctl_node@pgsql-002(58480):pg_auto_failover-[unknown] STATEMENT: SELECT nodeid, groupid, pgautofailover.remove_node(nodeid::int, $3) FROM pgautofailover.node WHERE formationid = $1 and nodename = $2
[2023-12-24 17:57:43 KST]-54992 -autoctl_node@pgsql-002(58480):pg_auto_failover-[unknown] LOG: Setting number_sync_standbys to 0 for formation "test" now that we have 0 standby nodes set with replication-quorum.
[2023-12-24 17:57:43 KST]-54992 -autoctl_node@pgsql-002(58480):pg_auto_failover-[unknown] STATEMENT: SELECT nodeid, groupid, pgautofailover.remove_node(nodeid::int, $3) FROM pgautofailover.node WHERE formationid = $1 and nodename = $2
[2023-12-24 17:57:43 KST]-54992 -autoctl_node@pgsql-002(58480):pg_auto_failover-[unknown] LOG: Setting goal state of node 5228 "test1" (pgsql-001:5432) to single as there is no other node.
[2023-12-24 17:57:43 KST]-54992 -autoctl_node@pgsql-002-(58480):pg_auto_failover-[unknown] STATEMENT: SELECT nodeid, groupid, pgautofailover.remove_node(nodeid::int, $3) FROM pgautofailover.node WHERE formationid = $1 and nodename = $2
[2023-12-24 17:57:43 KST]-54988 -autoctl_node@pgsql-002(58478):pg_auto_failover-[unknown] LOG: Setting goal state of node 5229 "test2" (pgsql-002:5432) to single as there is no other node.
[2023-12-24 17:57:43 KST]-54988 -autoctl_node@pgsql-002(58478):pg_auto_failover-[unknown] STATEMENT: SELECT * FROM pgautofailover.node_active($1, $2, $3, $4::pgautofailover.replication_state, $5, $6, $7, $8)
[2023-12-24 17:57:44 KST]-56408 -autoctl_node@pgsql-001(51602):pg_auto_failover-[unknown] LOG: New state is reported by node 5228 "test1" (pgsql-001:5432): "single"
[2023-12-24 17:57:44 KST]-56408 -autoctl_node@pgsql-001(51602):pg_auto_failover-[unknown] STATEMENT: SELECT * FROM pgautofailover.node_active($1, $2, $3, $4::pgautofailover.replication_state, $5, $6, $7, $8)
[2023-12-24 17:57:45 KST]-57217 -autoctl_node@pgsql-002(58490):pg_auto_failover-[unknown] LOG: New state is reported by node 5229 "test2" (pgsql-002:5432): "single"
[2023-12-24 17:57:45 KST]-57217 -autoctl_node@pgsql-002(58490):pg_auto_failover-[unknown] STATEMENT: SELECT * FROM pgautofailover.node_active($1, $2, $3, $4::pgautofailover.replication_state, $5, $6, $7, $8)
pg_auotctl drop node --destroy --force
Lastly, if I run pg_autoctl drop node --destroy --force, will this issue be resolved?
#### pg_autoctl drop node --destroy --force (secondary)
#### secondary DB log
[2023-12-26 12:11:45 KST]-72668 LOG: received fast shutdown request
[2023-12-26 12:11:45 KST]-72668 LOG: aborting any active transactions
[2023-12-26 12:11:45 KST]-72674 FATAL: terminating walreceiver process due to administrator command
[2023-12-26 12:11:45 KST]-72671 LOG: shutting down
[2023-12-26 12:11:45 KST]-72671 LOG: restartpoint starting: shutdown immediate
[2023-12-26 12:11:45 KST]-72671 LOG: restartpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.001 s, total=0.008 s; sync files=0, longest=0.000 s, average=0.000 s; distance=16384 kB, estimate=16384 kB
[2023-12-26 12:11:45 KST]-72671 LOG: recovery restart point at 0/19000060
[2023-12-26 12:11:45 KST]-72668 LOG: database system is shut down
#### pg_autoctl.log(secondary)
+12:11:45 72656 INFO pg_autoctl received signal SIGQUIT, terminating
+12:11:45 72659 INFO Postgres controller service received signal SIGQUIT, terminating
+12:11:45 72659 INFO Stopping pg_autoctl postgres service
12:11:45 72659 INFO pg_ctl --pgdata $PGDATA --wait stop --mode fast
12:11:45 72656 INFO Stop pg_autoctl
#### monitor server log(secondary)
+12:11:45 35223 INFO Removing node 15 "dev-pgf200003" (dev-pgf200003:6432) from formation "sangmudb" and group 0
+12:11:46 35223 INFO Setting goal state of node 10 "dev-pgf200002" (dev-pgf200002:6432) to single as there is no other node.
12:11:46 35223 INFO New state for node 10 "dev-pgf200002" (dev-pgf200002:6432): primary ➜ single
12:11:46 35223 INFO New state is reported by node 10 "dev-pgf200002" (dev-pgf200002:6432): "single"
12:11:46 35223 INFO New state for node 10 "dev-pgf200002" (dev-pgf200002:6432): single ➜ single
version
test
pg_auotctl drop node --destroy --force