zalando / postgres-operator

Postgres operator creates and manages PostgreSQL clusters running in Kubernetes
https://postgres-operator.readthedocs.io/
MIT License
4.22k stars 968 forks source link

Logical backup job fails if pooler is enabled for the cluster - pg_dump: error: connection to database "template1" #1411

Open zagr0 opened 3 years ago

zagr0 commented 3 years ago

Please, answer some short questions which should help us to understand your problem / question better?

When the cluster manifest enables poolers cronjobs for logical backups fail with error pg_dump: error: connection to database "template1" failed: ERROR: unexpected response from login query

job full logs:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 12220    0 12220    0     0   917k      0 --:--:-- --:--:-- --:--:--  994k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 20702    0 20702    0     0  1555k      0 --:--:-- --:--:-- --:--:-- 1555k
+ dump
+ /usr/lib/postgresql/13/bin/pg_dumpall
+ compress
+ pigz
+ upload
+ case $LOGICAL_BACKUP_PROVIDER in
++ estimate_size
++ /usr/lib/postgresql/13/bin/psql -tqAc 'select sum(pg_database_size(datname)::numeric) from pg_database;'
+ aws_upload 6710488
+ declare -r EXPECTED_SIZE=6710488
++ date +%s
+ PATH_TO_BACKUP=s3://postgres-operator-backups-bucket/spilo/test-db/aae491ea-8a1b-4d1f-b5bd-755466eae34c/logical_backups/1616113817.sql.gz
+ args=()
+ [[ ! -z 6710488 ]]
+ args+=("--expected-size=$EXPECTED_SIZE")
+ [[ ! -z '' ]]
+ [[ ! -z us-east-1 ]]
+ args+=("--region=$LOGICAL_BACKUP_S3_REGION")
+ [[ ! -z AES256 ]]
+ args+=("--sse=$LOGICAL_BACKUP_S3_SSE")
+ aws s3 cp - s3://postgres-operator-backups-bucket/spilo/test-db/aae491ea-8a1b-4d1f-b5bd-755466eae34c/logical_backups/1616113817.sql.gz --expected-size=6710488 --region=us-east-1 --sse=AES256
pg_dump: error: connection to database "template1" failed: ERROR:  unexpected response from login query
pg_dumpall: error: pg_dump failed on database "template1", exiting

cluster manifest:

apiVersion: "acid.zalan.do/v1"
kind: "postgresql"
metadata:
  name: "test-db"
  namespace: "postgres-test"
  labels:
    team: test
spec:
  teamId: "test"
  postgresql:
    version: "13"
    parameters:
      max_connections: "100"
  numberOfInstances: 2
  enableConnectionPooler: true
  enableReplicaConnectionPooler: true
  enableLogicalBackup: true
  volume:
    size: "5Gi"
  users:
    admin:
    - superuser
    - createdb
    test-user: []
  databases:
    test: test-user
  resources:
    requests:
      cpu: 100m
      memory: 500Mi
    limits:
      cpu: 1000m
      memory: 2000Mi
  sidecars:
  - name: postgres-exporter
    image: wrouesnel/postgres_exporter
    env:
    - name: DATA_SOURCE_USER
      valueFrom:
        secretKeyRef:
          name: postgres.test-db.credentials.postgresql.acid.zalan.do
          key: username
    - name: DATA_SOURCE_PASS
      valueFrom:
        secretKeyRef:
          key: password
          name: postgres.test-db.credentials.postgresql.acid.zalan.do
    - name: "DATA_SOURCE_NAME"
      value: "postgresql://$(DATA_SOURCE_USER):$(DATA_SOURCE_PASS)@$(POD_NAME):5432/postgres"
    ports:
    - containerPort: 9187
      protocol: TCP
    resources:
      limits:
        cpu: 500m
        memory: 100Mi
      requests:
        cpu: 100m
        memory: 100Mi
zagr0 commented 3 years ago

juts for information, that until the root cause is found, we can apply manually on template1 database to workaround the issue:

CREATE SCHEMA pooler;
REVOKE ALL ON SCHEMA pooler FROM public, pooler;
GRANT USAGE ON SCHEMA pooler TO pooler;
CREATE OR REPLACE FUNCTION pooler.user_lookup(IN i_username text, OUT uname text, OUT phash text)
RETURNS record AS $$
BEGIN
    SELECT usename, passwd FROM pg_catalog.pg_shadow
    WHERE usename = i_username INTO uname, phash;
    RETURN;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;
REVOKE ALL ON FUNCTION pooler.user_lookup(text) FROM public, pooler;
GRANT EXECUTE ON FUNCTION pooler.user_lookup(text) TO pooler;
ntcong commented 3 years ago

It seems that the pooler schema was only created on cluster's creation. The other db I added to the cluster's config does not have pooler schema In this example I added sso to "preparedDatabases" when creating the cluster, then added salebot and apply the new config

sso=# \dn
         List of schemas
      Name       |     Owner
-----------------+----------------
 data            | sso_data_owner
 metric_helpers  | postgres
 pooler          | postgres
 public          | postgres
 user_management | postgres
(5 rows)

sso=# \connect salebot
psql (13.3 (Ubuntu 13.3-1.pgdg20.04+1), server 13.2 (Ubuntu 13.2-1.pgdg18.04+1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
You are now connected to database "salebot" as user "postgres". 
salebot=# \dn
           List of schemas
      Name       |       Owner
-----------------+--------------------
 data            | salebot_data_owner
 metric_helpers  | postgres
 public          | postgres
 user_management | postgres
(4 rows)

salebot=#

Apparently installLookupFunction (https://github.com/zalando/postgres-operator/blob/eeb59c5bfd127020fa69bb29c58827cec905aa00/pkg/cluster/database.go#L482) does not get called on cluster update

Routhinator commented 11 months ago

Workaround has fixed it for me, however it's concerning that this issue has gone unaddressed for 2 years. This should be prioritized or at least documented as a caveat when enabling poolers.

MikeVL commented 10 months ago

Sometimes i get some errors:

pg_dump: error: connection to server at "192.168.100.110", port 5432 failed: FATAL:  bouncer config error
pg_dumpall: error: pg_dump failed on database "template1", exiting

kube pods

NAME                                        READY   STATUS      RESTARTS         AGE
postgres-backup-postgresql-28301790-x29ns   0/1     Completed   0                3d7h
postgres-backup-postgresql-28303230-9gj2k   0/1     Completed   0                2d7h
postgres-backup-postgresql-28306110-9xjl2   0/1     Error       0                7h8m
postgres-backup-postgresql-28306110-gggmj   0/1     Error       0                7h7m
postgres-backup-postgresql-28306110-gv7cl   0/1     Error       0                7h3m
postgres-backup-postgresql-28306110-tb4j6   0/1     Error       0                7h8m
postgres-backup-postgresql-28306110-w2j2c   0/1     Error       0                7h5m
postgres-backup-postgresql-28306110-x6j6p   0/1     Error       0                7h7m
postgres-backup-postgresql-28306110-z7xmv   0/1     Error       0                7h8m
postgresql-0                                1/1     Running     0                80d
postgresql-1                                1/1     Running     0                80d
postgresql-pooler-5b9c9749f6-jkhdn          1/1     Running     0                80d
postgresql-pooler-5b9c9749f6-q42k8          1/1     Running     0                80d
postgresql-pooler-repl-784f699c76-z4bnr     1/1     Running     0                80d
postgresql-pooler-repl-784f699c76-zhmbj     1/1     Running     0                80d
fmendez89 commented 6 months ago

Sometimes i get some errors:

pg_dump: error: connection to server at "192.168.100.110", port 5432 failed: FATAL:  bouncer config error
pg_dumpall: error: pg_dump failed on database "template1", exiting

kube pods

NAME                                        READY   STATUS      RESTARTS         AGE
postgres-backup-postgresql-28301790-x29ns   0/1     Completed   0                3d7h
postgres-backup-postgresql-28303230-9gj2k   0/1     Completed   0                2d7h
postgres-backup-postgresql-28306110-9xjl2   0/1     Error       0                7h8m
postgres-backup-postgresql-28306110-gggmj   0/1     Error       0                7h7m
postgres-backup-postgresql-28306110-gv7cl   0/1     Error       0                7h3m
postgres-backup-postgresql-28306110-tb4j6   0/1     Error       0                7h8m
postgres-backup-postgresql-28306110-w2j2c   0/1     Error       0                7h5m
postgres-backup-postgresql-28306110-x6j6p   0/1     Error       0                7h7m
postgres-backup-postgresql-28306110-z7xmv   0/1     Error       0                7h8m
postgresql-0                                1/1     Running     0                80d
postgresql-1                                1/1     Running     0                80d
postgresql-pooler-5b9c9749f6-jkhdn          1/1     Running     0                80d
postgresql-pooler-5b9c9749f6-q42k8          1/1     Running     0                80d
postgresql-pooler-repl-784f699c76-z4bnr     1/1     Running     0                80d
postgresql-pooler-repl-784f699c76-zhmbj     1/1     Running     0                80d

Hi @MikeVL I'm experiencing the same issue, did you resolve it?

MikeVL commented 6 months ago

Sometimes i get some errors:

pg_dump: error: connection to server at "192.168.100.110", port 5432 failed: FATAL:  bouncer config error
pg_dumpall: error: pg_dump failed on database "template1", exiting

kube pods

NAME                                        READY   STATUS      RESTARTS         AGE
postgres-backup-postgresql-28301790-x29ns   0/1     Completed   0                3d7h
postgres-backup-postgresql-28303230-9gj2k   0/1     Completed   0                2d7h
postgres-backup-postgresql-28306110-9xjl2   0/1     Error       0                7h8m
postgres-backup-postgresql-28306110-gggmj   0/1     Error       0                7h7m
postgres-backup-postgresql-28306110-gv7cl   0/1     Error       0                7h3m
postgres-backup-postgresql-28306110-tb4j6   0/1     Error       0                7h8m
postgres-backup-postgresql-28306110-w2j2c   0/1     Error       0                7h5m
postgres-backup-postgresql-28306110-x6j6p   0/1     Error       0                7h7m
postgres-backup-postgresql-28306110-z7xmv   0/1     Error       0                7h8m
postgresql-0                                1/1     Running     0                80d
postgresql-1                                1/1     Running     0                80d
postgresql-pooler-5b9c9749f6-jkhdn          1/1     Running     0                80d
postgresql-pooler-5b9c9749f6-q42k8          1/1     Running     0                80d
postgresql-pooler-repl-784f699c76-z4bnr     1/1     Running     0                80d
postgresql-pooler-repl-784f699c76-zhmbj     1/1     Running     0                80d

Hi @MikeVL I'm experiencing the same issue, did you resolve it?

No. After restart postgresql pod works fine some time

fmendez89 commented 6 months ago

Sometimes i get some errors:

pg_dump: error: connection to server at "192.168.100.110", port 5432 failed: FATAL:  bouncer config error
pg_dumpall: error: pg_dump failed on database "template1", exiting

kube pods

NAME                                        READY   STATUS      RESTARTS         AGE
postgres-backup-postgresql-28301790-x29ns   0/1     Completed   0                3d7h
postgres-backup-postgresql-28303230-9gj2k   0/1     Completed   0                2d7h
postgres-backup-postgresql-28306110-9xjl2   0/1     Error       0                7h8m
postgres-backup-postgresql-28306110-gggmj   0/1     Error       0                7h7m
postgres-backup-postgresql-28306110-gv7cl   0/1     Error       0                7h3m
postgres-backup-postgresql-28306110-tb4j6   0/1     Error       0                7h8m
postgres-backup-postgresql-28306110-w2j2c   0/1     Error       0                7h5m
postgres-backup-postgresql-28306110-x6j6p   0/1     Error       0                7h7m
postgres-backup-postgresql-28306110-z7xmv   0/1     Error       0                7h8m
postgresql-0                                1/1     Running     0                80d
postgresql-1                                1/1     Running     0                80d
postgresql-pooler-5b9c9749f6-jkhdn          1/1     Running     0                80d
postgresql-pooler-5b9c9749f6-q42k8          1/1     Running     0                80d
postgresql-pooler-repl-784f699c76-z4bnr     1/1     Running     0                80d
postgresql-pooler-repl-784f699c76-zhmbj     1/1     Running     0                80d

Hi @MikeVL I'm experiencing the same issue, did you resolve it?

No. After restart postgresql pod works fine some time

I have disabled replication pooler for now to see if that works, because the logical backup tries to connect to the pooler-repl. I have executed a manual job and worked fine, but ill keep an eye on it to see how it works on the next days.

UPDATE: With the replica pooler disabled the logical backup works fine in our infrastructure.

dhirajbadu commented 2 months ago

I am also getting this issue. Any update on this?

CREATE SCHEMA pooler; REVOKE ALL ON SCHEMA pooler FROM public, pooler; GRANT USAGE ON SCHEMA pooler TO pooler; CREATE OR REPLACE FUNCTION pooler.user_lookup(IN i_username text, OUT uname text, OUT phash text) RETURNS record AS $$ BEGIN SELECT usename, passwd FROM pg_catalog.pg_shadow WHERE usename = i_username INTO uname, phash; RETURN; END; $$ LANGUAGE plpgsql SECURITY DEFINER; REVOKE ALL ON FUNCTION pooler.user_lookup(text) FROM public, pooler; GRANT EXECUTE ON FUNCTION pooler.user_lookup(text) TO pooler;

This solution does not works for me. I am using Postgressql 12.