Logical backup job fails if pooler is enabled for the cluster - pg_dump: error: connection to database "template1"

zagr0 commented 3 years ago

Please, answer some short questions which should help us to understand your problem / question better?

Which image of the operator are you using? registry.opensource.zalan.do/acid/postgres-operator:v1.6.1
**Where do you run it - cloud or metal? Kubernetes with kops on AWS
Are you running Postgres Operator in production? we are going to
Type of issue? Bug report

When the cluster manifest enables poolers cronjobs for logical backups fail with error pg_dump: error: connection to database "template1" failed: ERROR: unexpected response from login query

job full logs:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 12220    0 12220    0     0   917k      0 --:--:-- --:--:-- --:--:--  994k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 20702    0 20702    0     0  1555k      0 --:--:-- --:--:-- --:--:-- 1555k
+ dump
+ /usr/lib/postgresql/13/bin/pg_dumpall
+ compress
+ pigz
+ upload
+ case $LOGICAL_BACKUP_PROVIDER in
++ estimate_size
++ /usr/lib/postgresql/13/bin/psql -tqAc 'select sum(pg_database_size(datname)::numeric) from pg_database;'
+ aws_upload 6710488
+ declare -r EXPECTED_SIZE=6710488
++ date +%s
+ PATH_TO_BACKUP=s3://postgres-operator-backups-bucket/spilo/test-db/aae491ea-8a1b-4d1f-b5bd-755466eae34c/logical_backups/1616113817.sql.gz
+ args=()
+ [[ ! -z 6710488 ]]
+ args+=("--expected-size=$EXPECTED_SIZE")
+ [[ ! -z '' ]]
+ [[ ! -z us-east-1 ]]
+ args+=("--region=$LOGICAL_BACKUP_S3_REGION")
+ [[ ! -z AES256 ]]
+ args+=("--sse=$LOGICAL_BACKUP_S3_SSE")
+ aws s3 cp - s3://postgres-operator-backups-bucket/spilo/test-db/aae491ea-8a1b-4d1f-b5bd-755466eae34c/logical_backups/1616113817.sql.gz --expected-size=6710488 --region=us-east-1 --sse=AES256
pg_dump: error: connection to database "template1" failed: ERROR:  unexpected response from login query
pg_dumpall: error: pg_dump failed on database "template1", exiting

cluster manifest:

apiVersion: "acid.zalan.do/v1"
kind: "postgresql"
metadata:
  name: "test-db"
  namespace: "postgres-test"
  labels:
    team: test
spec:
  teamId: "test"
  postgresql:
    version: "13"
    parameters:
      max_connections: "100"
  numberOfInstances: 2
  enableConnectionPooler: true
  enableReplicaConnectionPooler: true
  enableLogicalBackup: true
  volume:
    size: "5Gi"
  users:
    admin:
    - superuser
    - createdb
    test-user: []
  databases:
    test: test-user
  resources:
    requests:
      cpu: 100m
      memory: 500Mi
    limits:
      cpu: 1000m
      memory: 2000Mi
  sidecars:
  - name: postgres-exporter
    image: wrouesnel/postgres_exporter
    env:
    - name: DATA_SOURCE_USER
      valueFrom:
        secretKeyRef:
          name: postgres.test-db.credentials.postgresql.acid.zalan.do
          key: username
    - name: DATA_SOURCE_PASS
      valueFrom:
        secretKeyRef:
          key: password
          name: postgres.test-db.credentials.postgresql.acid.zalan.do
    - name: "DATA_SOURCE_NAME"
      value: "postgresql://$(DATA_SOURCE_USER):$(DATA_SOURCE_PASS)@$(POD_NAME):5432/postgres"
    ports:
    - containerPort: 9187
      protocol: TCP
    resources:
      limits:
        cpu: 500m
        memory: 100Mi
      requests:
        cpu: 100m
        memory: 100Mi

zagr0 commented 3 years ago

juts for information, that until the root cause is found, we can apply manually on template1 database to workaround the issue:

CREATE SCHEMA pooler;
REVOKE ALL ON SCHEMA pooler FROM public, pooler;
GRANT USAGE ON SCHEMA pooler TO pooler;
CREATE OR REPLACE FUNCTION pooler.user_lookup(IN i_username text, OUT uname text, OUT phash text)
RETURNS record AS $$
BEGIN
    SELECT usename, passwd FROM pg_catalog.pg_shadow
    WHERE usename = i_username INTO uname, phash;
    RETURN;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;
REVOKE ALL ON FUNCTION pooler.user_lookup(text) FROM public, pooler;
GRANT EXECUTE ON FUNCTION pooler.user_lookup(text) TO pooler;

ntcong commented 3 years ago

It seems that the pooler schema was only created on cluster's creation. The other db I added to the cluster's config does not have pooler schema In this example I added sso to "preparedDatabases" when creating the cluster, then added salebot and apply the new config

sso=# \dn
         List of schemas
      Name       |     Owner
-----------------+----------------
 data            | sso_data_owner
 metric_helpers  | postgres
 pooler          | postgres
 public          | postgres
 user_management | postgres
(5 rows)

sso=# \connect salebot
psql (13.3 (Ubuntu 13.3-1.pgdg20.04+1), server 13.2 (Ubuntu 13.2-1.pgdg18.04+1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
You are now connected to database "salebot" as user "postgres". 
salebot=# \dn
           List of schemas
      Name       |       Owner
-----------------+--------------------
 data            | salebot_data_owner
 metric_helpers  | postgres
 public          | postgres
 user_management | postgres
(4 rows)

salebot=#

Apparently installLookupFunction (https://github.com/zalando/postgres-operator/blob/eeb59c5bfd127020fa69bb29c58827cec905aa00/pkg/cluster/database.go#L482) does not get called on cluster update

Routhinator commented 11 months ago

Workaround has fixed it for me, however it's concerning that this issue has gone unaddressed for 2 years. This should be prioritized or at least documented as a caveat when enabling poolers.

MikeVL commented 10 months ago

Sometimes i get some errors:

pg_dump: error: connection to server at "192.168.100.110", port 5432 failed: FATAL:  bouncer config error
pg_dumpall: error: pg_dump failed on database "template1", exiting

kube pods

NAME                                        READY   STATUS      RESTARTS         AGE
postgres-backup-postgresql-28301790-x29ns   0/1     Completed   0                3d7h
postgres-backup-postgresql-28303230-9gj2k   0/1     Completed   0                2d7h
postgres-backup-postgresql-28306110-9xjl2   0/1     Error       0                7h8m
postgres-backup-postgresql-28306110-gggmj   0/1     Error       0                7h7m
postgres-backup-postgresql-28306110-gv7cl   0/1     Error       0                7h3m
postgres-backup-postgresql-28306110-tb4j6   0/1     Error       0                7h8m
postgres-backup-postgresql-28306110-w2j2c   0/1     Error       0                7h5m
postgres-backup-postgresql-28306110-x6j6p   0/1     Error       0                7h7m
postgres-backup-postgresql-28306110-z7xmv   0/1     Error       0                7h8m
postgresql-0                                1/1     Running     0                80d
postgresql-1                                1/1     Running     0                80d
postgresql-pooler-5b9c9749f6-jkhdn          1/1     Running     0                80d
postgresql-pooler-5b9c9749f6-q42k8          1/1     Running     0                80d
postgresql-pooler-repl-784f699c76-z4bnr     1/1     Running     0                80d
postgresql-pooler-repl-784f699c76-zhmbj     1/1     Running     0                80d

fmendez89 commented 6 months ago

Sometimes i get some errors:

pg_dump: error: connection to server at "192.168.100.110", port 5432 failed: FATAL:  bouncer config error
pg_dumpall: error: pg_dump failed on database "template1", exiting

kube pods

NAME                                        READY   STATUS      RESTARTS         AGE
postgres-backup-postgresql-28301790-x29ns   0/1     Completed   0                3d7h
postgres-backup-postgresql-28303230-9gj2k   0/1     Completed   0                2d7h
postgres-backup-postgresql-28306110-9xjl2   0/1     Error       0                7h8m
postgres-backup-postgresql-28306110-gggmj   0/1     Error       0                7h7m
postgres-backup-postgresql-28306110-gv7cl   0/1     Error       0                7h3m
postgres-backup-postgresql-28306110-tb4j6   0/1     Error       0                7h8m
postgres-backup-postgresql-28306110-w2j2c   0/1     Error       0                7h5m
postgres-backup-postgresql-28306110-x6j6p   0/1     Error       0                7h7m
postgres-backup-postgresql-28306110-z7xmv   0/1     Error       0                7h8m
postgresql-0                                1/1     Running     0                80d
postgresql-1                                1/1     Running     0                80d
postgresql-pooler-5b9c9749f6-jkhdn          1/1     Running     0                80d
postgresql-pooler-5b9c9749f6-q42k8          1/1     Running     0                80d
postgresql-pooler-repl-784f699c76-z4bnr     1/1     Running     0                80d
postgresql-pooler-repl-784f699c76-zhmbj     1/1     Running     0                80d

Hi @MikeVL I'm experiencing the same issue, did you resolve it?

MikeVL commented 6 months ago

Sometimes i get some errors:

pg_dump: error: connection to server at "192.168.100.110", port 5432 failed: FATAL:  bouncer config error
pg_dumpall: error: pg_dump failed on database "template1", exiting

kube pods

NAME                                        READY   STATUS      RESTARTS         AGE
postgres-backup-postgresql-28301790-x29ns   0/1     Completed   0                3d7h
postgres-backup-postgresql-28303230-9gj2k   0/1     Completed   0                2d7h
postgres-backup-postgresql-28306110-9xjl2   0/1     Error       0                7h8m
postgres-backup-postgresql-28306110-gggmj   0/1     Error       0                7h7m
postgres-backup-postgresql-28306110-gv7cl   0/1     Error       0                7h3m
postgres-backup-postgresql-28306110-tb4j6   0/1     Error       0                7h8m
postgres-backup-postgresql-28306110-w2j2c   0/1     Error       0                7h5m
postgres-backup-postgresql-28306110-x6j6p   0/1     Error       0                7h7m
postgres-backup-postgresql-28306110-z7xmv   0/1     Error       0                7h8m
postgresql-0                                1/1     Running     0                80d
postgresql-1                                1/1     Running     0                80d
postgresql-pooler-5b9c9749f6-jkhdn          1/1     Running     0                80d
postgresql-pooler-5b9c9749f6-q42k8          1/1     Running     0                80d
postgresql-pooler-repl-784f699c76-z4bnr     1/1     Running     0                80d
postgresql-pooler-repl-784f699c76-zhmbj     1/1     Running     0                80d

Hi @MikeVL I'm experiencing the same issue, did you resolve it?

No. After restart postgresql pod works fine some time

fmendez89 commented 6 months ago

Sometimes i get some errors:

pg_dump: error: connection to server at "192.168.100.110", port 5432 failed: FATAL:  bouncer config error
pg_dumpall: error: pg_dump failed on database "template1", exiting

kube pods

NAME                                        READY   STATUS      RESTARTS         AGE
postgres-backup-postgresql-28301790-x29ns   0/1     Completed   0                3d7h
postgres-backup-postgresql-28303230-9gj2k   0/1     Completed   0                2d7h
postgres-backup-postgresql-28306110-9xjl2   0/1     Error       0                7h8m
postgres-backup-postgresql-28306110-gggmj   0/1     Error       0                7h7m
postgres-backup-postgresql-28306110-gv7cl   0/1     Error       0                7h3m
postgres-backup-postgresql-28306110-tb4j6   0/1     Error       0                7h8m
postgres-backup-postgresql-28306110-w2j2c   0/1     Error       0                7h5m
postgres-backup-postgresql-28306110-x6j6p   0/1     Error       0                7h7m
postgres-backup-postgresql-28306110-z7xmv   0/1     Error       0                7h8m
postgresql-0                                1/1     Running     0                80d
postgresql-1                                1/1     Running     0                80d
postgresql-pooler-5b9c9749f6-jkhdn          1/1     Running     0                80d
postgresql-pooler-5b9c9749f6-q42k8          1/1     Running     0                80d
postgresql-pooler-repl-784f699c76-z4bnr     1/1     Running     0                80d
postgresql-pooler-repl-784f699c76-zhmbj     1/1     Running     0                80d

Hi @MikeVL I'm experiencing the same issue, did you resolve it?

No. After restart postgresql pod works fine some time

I have disabled replication pooler for now to see if that works, because the logical backup tries to connect to the pooler-repl. I have executed a manual job and worked fine, but ill keep an eye on it to see how it works on the next days.

UPDATE: With the replica pooler disabled the logical backup works fine in our infrastructure.

dhirajbadu commented 2 months ago

I am also getting this issue. Any update on this?

CREATE SCHEMA pooler; REVOKE ALL ON SCHEMA pooler FROM public, pooler; GRANT USAGE ON SCHEMA pooler TO pooler; CREATE OR REPLACE FUNCTION pooler.user_lookup(IN i_username text, OUT uname text, OUT phash text) RETURNS record AS $$ BEGIN SELECT usename, passwd FROM pg_catalog.pg_shadow WHERE usename = i_username INTO uname, phash; RETURN; END; $$ LANGUAGE plpgsql SECURITY DEFINER; REVOKE ALL ON FUNCTION pooler.user_lookup(text) FROM public, pooler; GRANT EXECUTE ON FUNCTION pooler.user_lookup(text) TO pooler;

This solution does not works for me. I am using Postgressql 12.

zalando / postgres-operator

Logical backup job fails if pooler is enabled for the cluster - pg_dump: error: connection to database "template1" #1411