Closed salman-bokhari closed 1 year ago
Authentication method was changed to scram-sha-256 in pg_hba.conf on all PostgreSQL servers and servers were restarted using pg_ctl. pgpool was stopped before updating pg_hba.conf. When we tried to connect to primary PostgreSQL via psql on another node inside the subnet, we were not able to connect.
This error message was shown on node inside subnet outside cluster
xflow@xflow:~$ psql -h 172.30.237.11
Password for user xflow:
psql: error: connection to server at "172.30.237.11", port 5432 failed: FATAL: password authentication failed for user "xflow"
Following logs were shown on Primary PostgreSQL node
postgres@xflow:~/stormatics/pg_cirrus/data$ 2023-07-12 11:36:43.980 UTC [6534] LOG: checkpoint starting: time
2023-07-12 11:36:44.008 UTC [6534] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.010 s, sync=0.003 s, total=0.029 s; sync files=2, longest=0.002 s, average=0.002 s; distance=0 kB, estimate=0 kB
2023-07-12 11:38:07.353 UTC [6712] FATAL: password authentication failed for user "xflow"
2023-07-12 11:38:07.353 UTC [6712] DETAIL: Role "xflow" does not exist.
Connection matched pg_hba.conf line 96: "host all all 172.30.0.0/16 scram-sha-256"
When pg_hba.conf was updated on standby nodes and server was restarted using psql following error logs were seen on Standby Nodes.
postgres@xflow:~$ 2023-07-12 11:33:30.681 UTC [6457] FATAL: could not connect to the primary server: connection to server at "172.30.237.11", port 5432 failed: fe_sendauth: no password supplied
2023-07-12 11:33:30.691 UTC [6458] FATAL: could not connect to the primary server: connection to server at "172.30.237.11", port 5432 failed: fe_sendauth: no password supplied
2023-07-12 11:33:30.691 UTC [6456] LOG: waiting for WAL to become available at 0/40000F0
When pgpool was started again it started doing failover which was also not successful
Replication works without exporting PGPASSWORD in setup-standby.yml playbook by adding <primary_ip>:<port>:*:repuser:{{ REPUSER_PASSWORD }}
to .pgpass file and -w switch to pg_basebackup command.
Client machine was not able to connect with pgpool node, this gets fixed by adding postgres:{{ PFILE_PASSWORD }}
to pool_passwd file in /etc/pgpool2
.
Replication after failover was not working as REPUSER_PASSWORD is not passed to follow_primary.sh file. This could be fixed using passfile=''/var/lib/pgsql/.pgpass''
in follow_primary.sh file so that the standby node should start replicating from new primary node.
As of now this issue has been resolved. The .pgpass file contains the following string for all nodes i.e. primary, standby1 and standby2.
<IP>:{{ PG_PORT }}:*:repuser:{{ REPUSER_PASSWORD }}
This will be reviewed and merge conflicts will be resolved before before being closed.
Since pgpool documentation was using passfile and we are using PGPASSFILE we need to test our logic on PostgreSQL versions 13 and 14.
Testing has been performed on PostgreSQL 13 and 14, all features work as expected. After review and resolving merge conflicts the Pull Request #80 will be merged.
This issue has been fixed PR #80 has been merged.
This is needed so that user must provide a password in order to connect with database even within subnet