stormatics / pg_cirrus

Set up 3-Node highly available PostgreSQL cluster
https://stormatics.tech/
GNU General Public License v3.0
56 stars 5 forks source link

Update authentication method to scram-sha-256 instead of trust in pg_hba.conf file #77

Closed salman-bokhari closed 1 year ago

salman-bokhari commented 1 year ago

This is needed so that user must provide a password in order to connect with database even within subnet

salman-bokhari commented 1 year ago

Authentication method was changed to scram-sha-256 in pg_hba.conf on all PostgreSQL servers and servers were restarted using pg_ctl. pgpool was stopped before updating pg_hba.conf. When we tried to connect to primary PostgreSQL via psql on another node inside the subnet, we were not able to connect.

This error message was shown on node inside subnet outside cluster

xflow@xflow:~$ psql -h 172.30.237.11
Password for user xflow:
psql: error: connection to server at "172.30.237.11", port 5432 failed: FATAL:  password authentication failed for user "xflow"

Following logs were shown on Primary PostgreSQL node

postgres@xflow:~/stormatics/pg_cirrus/data$ 2023-07-12 11:36:43.980 UTC [6534] LOG:  checkpoint starting: time
2023-07-12 11:36:44.008 UTC [6534] LOG:  checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.010 s, sync=0.003 s, total=0.029 s; sync files=2, longest=0.002 s, average=0.002 s; distance=0 kB, estimate=0 kB
2023-07-12 11:38:07.353 UTC [6712] FATAL:  password authentication failed for user "xflow"
2023-07-12 11:38:07.353 UTC [6712] DETAIL:  Role "xflow" does not exist.
        Connection matched pg_hba.conf line 96: "host    all             all         172.30.0.0/16       scram-sha-256"

When pg_hba.conf was updated on standby nodes and server was restarted using psql following error logs were seen on Standby Nodes.

postgres@xflow:~$ 2023-07-12 11:33:30.681 UTC [6457] FATAL:  could not connect to the primary server: connection to server at "172.30.237.11", port 5432 failed: fe_sendauth: no password supplied
2023-07-12 11:33:30.691 UTC [6458] FATAL:  could not connect to the primary server: connection to server at "172.30.237.11", port 5432 failed: fe_sendauth: no password supplied
2023-07-12 11:33:30.691 UTC [6456] LOG:  waiting for WAL to become available at 0/40000F0

When pgpool was started again it started doing failover which was also not successful

salman-bokhari commented 1 year ago

Replication works without exporting PGPASSWORD in setup-standby.yml playbook by adding <primary_ip>:<port>:*:repuser:{{ REPUSER_PASSWORD }} to .pgpass file and -w switch to pg_basebackup command.

Client machine was not able to connect with pgpool node, this gets fixed by adding postgres:{{ PFILE_PASSWORD }} to pool_passwd file in /etc/pgpool2.

Replication after failover was not working as REPUSER_PASSWORD is not passed to follow_primary.sh file. This could be fixed using passfile=''/var/lib/pgsql/.pgpass'' in follow_primary.sh file so that the standby node should start replicating from new primary node.

salman-bokhari commented 1 year ago

As of now this issue has been resolved. The .pgpass file contains the following string for all nodes i.e. primary, standby1 and standby2.

<IP>:{{ PG_PORT }}:*:repuser:{{ REPUSER_PASSWORD }}

This will be reviewed and merge conflicts will be resolved before before being closed.

salman-bokhari commented 1 year ago

Since pgpool documentation was using passfile and we are using PGPASSFILE we need to test our logic on PostgreSQL versions 13 and 14.

salman-bokhari commented 1 year ago

Testing has been performed on PostgreSQL 13 and 14, all features work as expected. After review and resolving merge conflicts the Pull Request #80 will be merged.

salman-bokhari commented 1 year ago

This issue has been fixed PR #80 has been merged.