hapostgres / pg_auto_failover

Postgres extension and service for automated failover and high-availability
Other
1.07k stars 113 forks source link

pg_autocl create monitor honors PGUSER (and PGHOST ) variables , but pg_autoctl create postgres doesn't #907

Closed Tiago-Anastacio closed 2 years ago

Tiago-Anastacio commented 2 years ago

Hi Dimitri,

we have several instances in the same server. So for security reasons we have one linux user for each database cluster. But, for the sake of simplicity, database user is postgres (set during initdb). So, yes we have a map=... option in the pg_hba.conf file and properly set pg_ident.conf

In order to handle this we use initdb first for the monitor node and all postgres nodes (primaries) then we raise pg_autoctl create monitor|postgres

The issue is:

First connexion to postgres raised when using pg_autoctl create monitor honors $PG.. variables particularly PGUSER and PGHOST but pg_autoctl create postgres doesn't. For the PGHOST variable (e.g .the socket directory it is not an issue because pg_autoctl create postgres grab this information from postgresql.conf file) But for PGUSER it is a big issue, because first connexion is made with linux user's name.

Example linux user is pgappli1 Statement is :

 /usr/pgsql-13/bin/pg_autoctl create postgres \
 --pgctl /usr/pgsql-13/bin/pg_ctl \
 --pgdata   /pgappli1/connexion/pg_data \
 --pgport 54320 \
 --formation appli1 \
 --username postgres \
 --dbname postgres \
 --hostname=`hostname -f` \
 --skip-pg-hba \ 
 --ssl-mode=verify-full \
 --ssl-ca-file=/pgappli1/connexion/.tls/root.crt \
 --server-key=/pgappli1/connexion/.tls/server.key \
 --server-cert=/pgappli1/connexion/.tls/server.crt  \
 --monitor postgres://autoctl_node@suvm-sles15-bac13.unix.sits.credit-agricole.fr:54320/pg_auto_failover?sslmode=verify-full&sslrootcert=/pgmonito/connexion/.tls/root.crt

Extract log:

11:36:53 8643 INFO  Initialising postgres as a primary
11:36:53 8643 WARN  Skipping HBA edits (per --skip-pg-hba) for rule: hostssl "postgres" "postgres" suvm-sles15-bac11.unix.sits.credit-agricole.fr trust
11:36:53 8643 WARN  Failed to connect to "postgres://@suvm-sles15-bac11:54320/template1?", retrying until the server is ready
11:36:56 8643 ERROR Connection to database failed: FATAL:  role "pgappli1" does not exist
11:36:56 8643 ERROR Failed to connect to "postgres://@suvm-sles15-bac11:54320/template1?" after 19 attempts in 2246 ms, pg_autoctl stops retrying now
env | grep PG
PGUSER=postgres
PGDATABASE=postgres
PGHOST=/pgappli1/connexion/.sockets

=> while in pg_autoctl create monitor I don't have this issue because $PGUSER is honored...

Could you please fix it ?

Thank you

DimCitus commented 2 years ago

I believe the bug is to be found at https://github.com/citusdata/pg_auto_failover/blob/efcfb73e4a2be4a315a74cf2e1b583cafc85fe0e/src/bin/pg_autoctl/keeper_pg_init.c#L947

What we do is set the username to an empty string in the connection string, so that we bypass the --username argument at this stage in the initialisation process, because we didn't create the target user yet. We want to connect to a known existing database (template1) with a known existing user (empty string usually does it).

In your case, you're using the postgres user as the initdb user for the system, so that's the role we should be using, but the current OS user is not the initdb user, which is unusual. We could make it so that if PGUSER exists and is not the same as --username we would connect to template1 with that user, rather than using an empty string, but I am not sure if that would allow any other case than yours to just work.

For security reasons, usually, each Postgres instance running on the same OS would get assigned its own initdb and superuser role, and the OS user would then be added to the postgres group so that debian permissions for the shared socket directory /var/run/postgresql would be okay. Would you consider doing that, and then running pg_autoctl with the initdb user, as intended? Otherwise I'm pretty sure we will continue to find problematic cases.

Tiago-Anastacio commented 2 years ago

OK, I'll use initdb with os user, then set --usename to postgresql so I'll have both superusers created. You can close the ticket. FYI on rhel community distribution, postgresql-server RPM set /var/run/postgresql to 755 postgres:postgres so that group users can't open socket and start PostgreSQL.

DimCitus commented 2 years ago

FYI on rhel community distribution, postgresql-server RPM set /var/run/postgresql to 755 postgres:postgres so that group users can't open socket and start PostgreSQL.

That sounds like a bug to report to Devrim and the community... consider changing that to the same as debian package, which reads drwxrwsr-x?