hapostgres / pg_auto_failover

Postgres extension and service for automated failover and high-availability
Other
1.07k stars 113 forks source link

Not able to register standby node #911

Closed rpdba closed 2 years ago

rpdba commented 2 years ago

Hi, I have registered primary node successfully and while registering secondary node, it fails with below error:

[postgres@pgdb002 ~]$ /opt/postgres/app/bin/pg_autoctl create postgres --pgdata /postgres/replica/data --auth trust --ssl-self-signed --username repadmin --dbname repdb --hostname pgdb002 --pgctl /opt/postgres/app/bin/pg_ctl --monitor 'postgres://autoctl_node@pgtl001:5432/pg_auto_failover?sslmode=require' 10:03:37 2214381 INFO Using default --ssl-mode "require" 10:03:37 2214381 INFO Using --ssl-self-signed: pg_autoctl will create self-signed certificates, allowing for encrypted network traffic 10:03:37 2214381 WARN Self-signed certificates provide protection against eavesdropping; this setup does NOT protect against Man-In-The-Middle attacks nor Impersonation attacks. 10:03:37 2214381 WARN See https://www.postgresql.org/docs/current/libpq-ssl.html for details 10:03:37 2214381 INFO Started pg_autoctl postgres service with pid 2214383 10:03:37 2214383 INFO /opt/postgres/app/bin/pg_autoctl do service postgres --pgdata /postgres/replica/data -v 10:03:37 2214381 INFO Started pg_autoctl node-init service with pid 2214384 10:03:37 2214384 INFO Registering Postgres system 7103992692940479706 running on port 5433 with pid 2213942 found at "/postgres/replica/data" 10:03:37 2214384 INFO A postgres directory already exists at "/postgres/replica/data", registering as a single node *10:03:37 2214384 ERROR Monitor ERROR: node pgdb002:5433 can not be registered in state single, it should be in state wait_standby 10:03:37 2214384 ERROR SQL query: SELECT FROM pgautofailover.register_node($1, $2, $3, $4, $5, $6, $7, $8, $9::pgautofailover.replication_state, $10, $11, $12, $13) 10:03:37 2214384 ERROR SQL params: 'default', 'pgdb002', '5433', 'repdb', '', '7103992692940479706', '-1', '-1', 'single', 'standalone', '50', 'true', 'default' 10:03:37 2214384 ERROR Failed to register node pgdb002:5433 in group -1 of formation "default" with initial state "single", see previous lines for details 10:03:37 2214384 ERROR Failed to register the existing local Postgres node "pgdb002:5433" running at "/postgres/replica/data"to the pg_auto_failover monitor at postgres://autoctl_node@pgtl001:5432/pg_auto_failover?sslmode=require, see above for details** 10:03:37 2214384 INFO Successfully registered as "unknown" to the monitor. 10:03:37 2214384 FATAL pg_autoctl does not know how to reach state "unknown" from "init" 10:03:37 2214381 ERROR pg_autoctl service node-init exited with exit status 12 10:03:37 2214381 INFO Restarting service node-init

Output of "pg_autoctl show state"

[postgres@pgdb002 ~]$ pg_autoctl show state Name | Node | Host:Port | TLI: LSN | Connection | Reported State | Assigned State -------+-------+------------------------+-------------------+--------------+---------------------+-------------------- node_1 | 1 | pgdb001.hh.se:5433 | 1: 108/6D5E3C48 | read-write | single | single

pg_autoctl version: pg_autoctl version 1.6.4 pg_autoctl extension version 1.6 compiled with PostgreSQL 14.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4), 64-bit compatible with Postgres 10, 11, 12, 13, and 14

DimCitus commented 2 years ago

Hi @rpdba ; the main problem seems to be:

10:03:37 2214384 INFO A postgres directory already exists at "/postgres/replica/data", registering as a single node

This seems to be related to the pg_autoctl code (wrongly?) determining that the local Postgres instance is running and NOT in recovery, so should be considered another primary node. Can you run the command again with DEBUG level output, using -vv (or --verbose --verbose)?

rpdba commented 2 years ago

Hi @DimCitus : Before trying with -vv, I brought down the standby postgres service, ran the same command to add standby to pg_autoctl and it it worked. It wasn't mentioned in the documents, I guess that even if we want add existing standby directories, postgres service on standby should be stopped before running pg_autoctl create postgres ...

Thanks for your quick support :)