Closed grapemix closed 1 year ago
You posted a lot of information, yet missing the postgres operator logs.
Also Postgres logs would be worth checking, to indicate the error messages there in relation to user creation and auth.
Users are stored in K8s secrets, can you list the secrets tagged with by default application=spilo
. It is odd to see the "standby" user error in the pod logs.
Thanks for replying and pointing out the debug direction. I don't know there has log in the pg operator pod. In short, I fixed a mis-configuration problem for the postgres opeator, but I got password authentication failure problem listed below.
The first problem I experienced is the mismatch cluster domain between the configKubernetes.cluster_domain and the kube-dns.
time="2020-08-30T22:16:40Z" level=info msg="Event(v1.ObjectReference{Kind:\"postgresql\", Namespace:\"test\", Name:\"test-testdb-cluster\", UID:\"e438c66b-dabc-4ecb-ab95-a23839cc23ab\", APIVersion:\"acid.zalan.do/v1\", ResourceVersion:\"3939103\", FieldPath:\"\"}): type: 'Normal' reason: 'StatefulSet' Pods are ready"
time="2020-08-30T22:16:41Z" level=error msg="could not connect to PostgreSQL database: dial tcp: lookup test-testdb-cluster.test.svc.MY_DOMAIN on 172.16.0.10:53: no such host" cluster-name=test/test-testdb-cluster pkg=cluster worker=0
time="2020-08-30T22:16:56Z" level=error msg="could not connect to PostgreSQL database: dial tcp: lookup test-testdb-cluster.test.svc.MY_DOMAIN on 172.16.0.10:53: no such host" cluster-name=test/test-testdb-cluster pkg=cluster worker=0
time="2020-08-30T22:17:10Z" level=error msg="could not connect to PostgreSQL database: dial tcp: lookup test-testdb-cluster.test.svc.MY_DOMAIN on 172.16.0.10:53: no such host" cluster-name=test/test-testdb-cluster pkg=cluster worker=0
time="2020-08-30T22:17:25Z" level=error msg="could not connect to PostgreSQL database: dial tcp: lookup test-testdb-cluster.test.svc.MY_DOMAIN on 172.16.0.10:53: no such host" cluster-name=test/test-testdb-cluster pkg=cluster worker=0
time="2020-08-30T22:17:41Z" level=error msg="could not connect to PostgreSQL database: dial tcp: lookup test-testdb-cluster.test.svc.phoenix on 172.16.0.10:53: no such host" cluster-name=test/test-testdb-cluster pkg=cluster worker=0
time="2020-08-30T22:17:56Z" level=error msg="could not connect to PostgreSQL database: dial tcp: lookup test-testdb-cluster.test.svc.MY_DOMAIN on 172.16.0.10:53: no such host" cluster-name=test/test-testdb-cluster pkg=cluster worker=0
time="2020-08-30T22:18:10Z" level=error msg="could not connect to PostgreSQL database: dial tcp: lookup test-testdb-cluster.test.svc.MY_DOMAIN on 172.16.0.10:53: no such host" cluster-name=test/test-testdb-cluster pkg=cluster worker=0
time="2020-08-30T22:18:26Z" level=error msg="could not connect to PostgreSQL database: dial tcp: lookup test-testdb-cluster.test.svc.MY_DOMAIN on 172.16.0.10:53: no such host" cluster-name=test/test-testdb-cluster pkg=cluster worker=0
time="2020-08-30T22:18:26Z" level=error msg="could not create cluster: could not create users: could not init db connection: could not init db connection: still failing after 8 retries" cluster-name=test/test-testdb-cluster pkg=controller worker=0
I found a similar issue like this, after syncing up the cluster domain name, this problem goes away.
Unfortunately, I got another problem - password authentication failure.
time="2020-08-31T01:16:31Z" level=info msg="Event(v1.ObjectReference{Kind:\"postgresql\", Namespace:\"test\", Name:\"acid-minimal-cluster\", UID:\"698d1fc8-d34c-49c0-8f69-ffe37bb98890\", APIVersion:\"acid.zalan.do/v1\", ResourceVersion:\"3981847\", FieldPath:\"\"}): type: 'Normal' reason: 'Create' Started creation of new cluster resources"
time="2020-08-31T01:16:31Z" level=info msg="endpoint \"test/acid-minimal-cluster\" has been successfully created" cluster-name=test/acid-minimal-cluster pkg=cluster worker=1
time="2020-08-31T01:16:31Z" level=info msg="Event(v1.ObjectReference{Kind:\"postgresql\", Namespace:\"test\", Name:\"acid-minimal-cluster\", UID:\"698d1fc8-d34c-49c0-8f69-ffe37bb98890\", APIVersion:\"acid.zalan.do/v1\", ResourceVersion:\"3981847\", FieldPath:\"\"}): type: 'Normal' reason: 'Endpoints' Endpoint \"test/acid-minimal-cluster\" has been successfully created"
time="2020-08-31T01:16:32Z" level=info msg="master service \"test/acid-minimal-cluster\" has been successfully created" cluster-name=test/acid-minimal-cluster pkg=cluster worker=1
time="2020-08-31T01:16:32Z" level=debug msg="No load balancer created for the replica service" cluster-name=test/acid-minimal-cluster pkg=cluster worker=1
time="2020-08-31T01:16:32Z" level=info msg="Event(v1.ObjectReference{Kind:\"postgresql\", Namespace:\"test\", Name:\"acid-minimal-cluster\", UID:\"698d1fc8-d34c-49c0-8f69-ffe37bb98890\", APIVersion:\"acid.zalan.do/v1\", ResourceVersion:\"3981847\", FieldPath:\"\"}): type: 'Normal' reason: 'Services' The service \"test/acid-minimal-cluster\" for role master has been successfully created"
time="2020-08-31T01:16:32Z" level=info msg="replica service \"test/acid-minimal-cluster-repl\" has been successfully created" cluster-name=test/acid-minimal-cluster pkg=cluster worker=1
time="2020-08-31T01:16:32Z" level=info msg="Event(v1.ObjectReference{Kind:\"postgresql\", Namespace:\"test\", Name:\"acid-minimal-cluster\", UID:\"698d1fc8-d34c-49c0-8f69-ffe37bb98890\", APIVersion:\"acid.zalan.do/v1\", ResourceVersion:\"3981847\", FieldPath:\"\"}): type: 'Normal' reason: 'Services' The service \"test/acid-minimal-cluster-repl\" for role replica has been successfully created"
time="2020-08-31T01:16:32Z" level=debug msg="team API is disabled, returning empty list of members for team \"acid\"" cluster-name=test/acid-minimal-cluster pkg=cluster worker=1
time="2020-08-31T01:16:32Z" level=info msg="users have been initialized" cluster-name=test/acid-minimal-cluster pkg=cluster worker=1
time="2020-08-31T01:16:32Z" level=debug msg="created new secret \"test/postgres.acid-minimal-cluster.credentials.postgresql.acid.zalan.do\", uid: \"872a575c-80b2-488e-a160-9f847b19bf51\"" cluster-name=test/acid-minimal-cluster pkg=cluster worker=1
time="2020-08-31T01:16:33Z" level=debug msg="created new secret \"test/standby.acid-minimal-cluster.credentials.postgresql.acid.zalan.do\", uid: \"34029784-96e5-4e33-afb3-ed05882142ab\"" cluster-name=test/acid-minimal-cluster pkg=cluster worker=1
time="2020-08-31T01:16:33Z" level=debug msg="created new secret \"test/foo-user.acid-minimal-cluster.credentials.postgresql.acid.zalan.do\", uid: \"3e36273b-913f-4c1c-bf1e-0d998359a808\"" cluster-name=test/acid-minimal-cluster pkg=cluster worker=1
time="2020-08-31T01:16:33Z" level=debug msg="created new secret \"test/zalando.acid-minimal-cluster.credentials.postgresql.acid.zalan.do\", uid: \"35dbcfd7-ad4f-4326-b3e2-4b2e02eca9f5\"" cluster-name=test/acid-minimal-cluster pkg=cluster worker=1
time="2020-08-31T01:16:33Z" level=info msg="secrets have been successfully created" cluster-name=test/acid-minimal-cluster pkg=cluster worker=1
time="2020-08-31T01:16:33Z" level=info msg="Event(v1.ObjectReference{Kind:\"postgresql\", Namespace:\"test\", Name:\"acid-minimal-cluster\", UID:\"698d1fc8-d34c-49c0-8f69-ffe37bb98890\", APIVersion:\"acid.zalan.do/v1\", ResourceVersion:\"3981847\", FieldPath:\"\"}): type: 'Normal' reason: 'Secrets' The secrets have been successfully created"
time="2020-08-31T01:16:33Z" level=info msg="pod disruption budget \"test/postgres-acid-minimal-cluster-pdb\" has been successfully created" cluster-name=test/acid-minimal-cluster pkg=cluster worker=1
time="2020-08-31T01:16:33Z" level=debug msg="Generating Spilo container, environment variables: [{SCOPE acid-minimal-cluster nil} {PGROOT /home/postgres/pgdata/pgroot nil} {POD_IP &EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:status.podIP,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,}} {POD_NAMESPACE &EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:metadata.namespace,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,}} {PGUSER_SUPERUSER postgres nil} {KUBERNETES_SCOPE_LABEL cluster-name nil} {KUBERNETES_ROLE_LABEL spilo-role nil} {PGPASSWORD_SUPERUSER &EnvVarSource{FieldRef:nil,ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:&SecretKeySelector{LocalObjectReference:LocalObjectReference{Name:postgres.acid-minimal-cluster.credentials.postgresql.acid.zalan.do,},Key:password,Optional:nil,},}} {PGUSER_STANDBY standby nil} {PGPASSWORD_STANDBY &EnvVarSource{FieldRef:nil,ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:&SecretKeySelector{LocalObjectReference:LocalObjectReference{Name:standby.acid-minimal-cluster.credentials.postgresql.acid.zalan.do,},Key:password,Optional:nil,},}} {PAM_OAUTH2 nil} {HUMAN_ROLE zalandos nil} {KUBERNETES_LABELS {\"application\":\"spilo\"} nil} {SPILO_CONFIGURATION {\"postgresql\":{\"bin_dir\":\"/usr/lib/postgresql/12/bin\"},\"bootstrap\":{\"initdb\":[{\"auth-host\":\"md5\"},{\"auth-local\":\"trust\"}],\"users\":{\"zalandos\":{\"password\":\"\",\"options\":[\"CREATEDB\",\"NOLOGIN\"]}},\"dcs\":{}}} nil} {DCS_ENABLE_KUBERNETES_API true nil}]" cluster-name=test/acid-minimal-cluster pkg=cluster worker=1
time="2020-08-31T01:16:33Z" level=debug msg="created new statefulset \"test/acid-minimal-cluster\", uid: \"e8cafed9-6f93-4963-b03e-4118731bac58\"" cluster-name=test/acid-minimal-cluster pkg=cluster worker=1
time="2020-08-31T01:16:33Z" level=info msg="statefulset \"test/acid-minimal-cluster\" has been successfully created" cluster-name=test/acid-minimal-cluster pkg=cluster worker=1
time="2020-08-31T01:16:33Z" level=info msg="waiting for the cluster being ready" cluster-name=test/acid-minimal-cluster pkg=cluster worker=1
time="2020-08-31T01:16:33Z" level=info msg="Event(v1.ObjectReference{Kind:\"postgresql\", Namespace:\"test\", Name:\"acid-minimal-cluster\", UID:\"698d1fc8-d34c-49c0-8f69-ffe37bb98890\", APIVersion:\"acid.zalan.do/v1\", ResourceVersion:\"3981847\", FieldPath:\"\"}): type: 'Normal' reason: 'StatefulSet' Statefulset \"test/acid-minimal-cluster\" has been successfully created"
time="2020-08-31T01:16:36Z" level=debug msg="Waiting for 1 pods to become ready" cluster-name=test/acid-minimal-cluster pkg=cluster worker=1
time="2020-08-31T01:16:45Z" level=info msg="pods are ready" cluster-name=test/acid-minimal-cluster pkg=cluster worker=1
time="2020-08-31T01:16:45Z" level=info msg="Create roles" cluster-name=test/acid-minimal-cluster pkg=cluster worker=1
time="2020-08-31T01:16:45Z" level=info msg="Event(v1.ObjectReference{Kind:\"postgresql\", Namespace:\"test\", Name:\"acid-minimal-cluster\", UID:\"698d1fc8-d34c-49c0-8f69-ffe37bb98890\", APIVersion:\"acid.zalan.do/v1\", ResourceVersion:\"3981847\", FieldPath:\"\"}): type: 'Normal' reason: 'StatefulSet' Pods are ready"
time="2020-08-31T01:16:45Z" level=error msg="could not create cluster: could not create users: could not init db connection: could not init db connection: pq: password authentication failed for user \"postgres\"" cluster-name=test/acid-minimal-cluster pkg=controller worker=1
time="2020-08-31T01:16:45Z" level=info msg="Event(v1.ObjectReference{Kind:\"postgresql\", Namespace:\"test\", Name:\"acid-minimal-cluster\", UID:\"698d1fc8-d34c-49c0-8f69-ffe37bb98890\", APIVersion:\"acid.zalan.do/v1\", ResourceVersion:\"3981959\", FieldPath:\"\"}): type: 'Warning' reason: 'Create' could not create cluster: could not create users: could not init db connection: could not init db connection: pq: password authentication failed for user \"postgres\""
I have tried my setup as well as the setup I grabbed from the doc:
apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
name: acid-minimal-cluster
spec:
teamId: "acid"
volume:
size: 1Gi
numberOfInstances: 1
users:
# database owner
zalando:
- superuser
- createdb
# role for application foo
foo_user: # or 'foo_user: []'
#databases: name->owner
databases:
foo: zalando
postgresql:
version: "12"
Any ideas? Thanks.
Well, we have the same problem and I probably figured out why - at least for our case.
We use longhorn for storage provisioning and want to keep the data whatever it takes and reuse Volumes, so we can shut down the whole cluster and set it up again. The users get created for the first time in our database and provide a secret postgres.project-postgres.credentials.postgresql.acid.zalan.do which has the correct password to the entry in pg_shadow. When we restart the whole cluster / namespace, the "old" password of postgres (and of course for standby and all other roles) is still persistent due to the volume, but a new password will be generated and provided as new secret. What's missing here (I think) is that the initdb should alter the user password when the user already exists. Does this makes sense? https://github.com/zalando/postgres-operator/blob/692c721854e4667102778e8e69e3dd12e47984b5/pkg/util/users/users.go#L51 should handle this (should also be called with EventSync at cluster startup)
Can you give me an advice @Jan-M
So yeah, going into the operator logs we also have this
time="2021-02-09T04:08:30Z" level=debug msg="syncing roles" cluster-name=artiscluster-dev/artis-postgres pkg=cluster worker=0
time="2021-02-09T04:08:30Z" level=warning msg="error while syncing cluster state: could not sync roles: could not init db connection: could not init db connection: pq: password authentication failed for user \"postgres\"" cluster-name=artiscluster-dev/artis-postgres pkg=cluster worker=0
time="2021-02-09T04:08:30Z" level=error msg="could not sync cluster: could not sync roles: could not init db connection: could not init db connection: pq: password authentication failed for user \"postgres\"" cluster-name=artiscluster-dev/artis-postgres pkg=controller worker=0
After restarting the generated postgres
user password obviously doesn't match up with the saved one - so it cannot change anything else.
Ran in to this issue as well. Recreated the postgresql
cluster with an existing volume. The new cluster started up but the postgres operator is failing with the following error log: pq: password authentication failed for user \"postgres\""
export NAME=my-db
export NAMESPACE=my-ns
kubectl get secrets -o yaml postgres.${NAME}.credentials.postgresql.acid.zalan.do -n ${NAMESPACE} \
| grep password \
| head -n 1 \
| awk '{ print $2 }' \
| base64 --decode
kubectl exec -it ${NAME}-0 -- /bin/bash
$ psql postgres postgres
$ \password
// type in the password from step 1
kubectl delete pod -n postgres-operator postgres-operator-xxx-yyy
kubectl logs -n postgres-operator svc/postgres-operator -f
or the other way around:
env | grep PASS
echo -n "grepped-password" | base64
Any updates on the issue not requiring manual intervention?
Any update on this?
This seems the same issue as this one https://github.com/zalando/postgres-operator/issues/524 but with another reason for it to happen.
The manual workaround by @Starefossen is nice to know, however not solution for my usecase. I have this running in a CI environment and so therefore this should work without intervention.
I have meet same issue, any update on this. @Jan-M @FxKu postgres-operator.tar.gz
Ran in to this issue as well. Recreated the
postgresql
cluster with an existing volume. The new cluster started up but the postgres operator is failing with the following error log:pq: password authentication failed for user \"postgres\""
Recovery procedure
- get the new postgres user password:
export NAME=my-db export NAMESPACE=my-ns kubectl get secrets -o yaml postgres.${NAME}.credentials.postgresql.acid.zalan.do -n ${NAMESPACE} \ | grep password \ | head -n 1 \ | awk '{ print $2 }' \ | base64 --decode
- exec into the database pod and change the password manually
kubectl exec -it ${NAME}-0 -- /bin/bash $ psql postgres postgres $ \password // type in the password from step 1
- restart the postgres-operator
kubectl delete pod -n postgres-operator postgres-operator-xxx-yyy kubectl logs -n postgres-operator svc/postgres-operator -f
I have followed this guide, but replication between master and slaves is disconnected like follows:
82 2022-03-02 14:57:00 UTC [60]: [4-1] 621f85bc.3c 0 HINT: Future log output will appear in directory "../pg_log".
83 /var/run/postgresql:5432 - rejecting connections
84 /var/run/postgresql:5432 - accepting connections
85 2022-03-02 14:57:00,244 INFO: establishing a new patroni connection to the postgres cluster
86 2022-03-02 14:57:00,327 WARNING: Request failed to acid-my-postgres-cluster-1: GET http://10.244.2.118:8008/patroni (HTTPConnectionPool(hos t='10.244.2.118', port=8008): Max retries exceeded with url: /patroni (Caused by NewConnectionError('<urllib3.connection.HTTPConnection obj ect at 0x7f4ac431ab00>: Failed to establish a new connection: [Errno 111] Connection refused',)))
87 2022-03-02 14:57:00,327 WARNING: Request failed to acid-my-postgres-cluster-2: GET http://10.244.1.117:8008/patroni (HTTPConnectionPool(hos t='10.244.1.117', port=8008): Max retries exceeded with url: /patroni (Caused by NewConnectionError('<urllib3.connection.HTTPConnection obj ect at 0x7f4ac431afd0>: Failed to establish a new connection: [Errno 111] Connection refused',)))
88 2022-03-02 14:57:00,470 WARNING: Could not activate Linux watchdog device: "Can't open watchdog device: [Errno 2] No such file or directory : '/dev/watchdog'"
89 2022-03-02 14:57:00,482 ERROR: Can not fetch local timeline and lsn from replication connection
90 Traceback (most recent call last):
91 File "/usr/local/lib/python3.6/dist-packages/patroni/postgresql/__init__.py", line 821, in get_replica_timeline
92 with self.get_replication_connection_cursor(**self.config.local_replication_address) as cur:
93 File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
94 return next(self.gen)
95 File "/usr/local/lib/python3.6/dist-packages/patroni/postgresql/__init__.py", line 816, in get_replication_connection_cursor
96 with get_connection_cursor(**conn_kwargs) as cur:
97 File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
98 return next(self.gen)
99 File "/usr/local/lib/python3.6/dist-packages/patroni/postgresql/connection.py", line 43, in get_connection_cursor
100 conn = psycopg2.connect(**kwargs)
101 File "/usr/lib/python3/dist-packages/psycopg2/__init__.py", line 127, in connect
102 conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
103 psycopg2.OperationalError: connection to server at "localhost" (127.0.0.1), port 5432 failed: FATAL: password authentication failed for us er "standby"
104 connection to server at "localhost" (127.0.0.1), port 5432 failed: FATAL: no pg_hba.conf entry for replication connection from host "127.0 .0.1", user "standby", no encryption
105
106 2022-03-02 14:57:00,534 INFO: promoted self to leader by acquiring session lock
107 server promoting
108 2022-03-02 14:57:00,541 INFO: cleared rewind state after becoming the leader
root@acid-my-postgres-cluster-1:/home/postgres# patronictl list
+ Cluster: acid-my-postgres-cluster (7070449311109967941) ------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+----------------------------+--------------+---------+---------+----+-----------+
| acid-my-postgres-cluster-0 | 10.244.0.61 | Leader | running | 4 | |
| acid-my-postgres-cluster-1 | 10.244.2.120 | Replica | running | | 49 |
| acid-my-postgres-cluster-2 | 10.244.1.118 | Replica | running | | 17 |
+----------------------------+--------------+---------+---------+----+-----------+
I have got the same issue for standby
user. Replication doesn't work with a password that store in Secret.
psycopg2.OperationalError: connection to server at "10.220.14.88", port 5432 failed: FATAL: password authentication failed for user "standby"
connection to server at "10.220.14.88", port 5432 failed: FATAL: password authentication failed for user "standby"
Only when I change a password throw psql in master node it will start.
export PGPASSWORD=$(kubectl get secret postgres.acid-minimal-cluster.credentials.standby.acid.zalan.do -o 'jsonpath={.data.password}' | base64 -d)
...
ALTER USER standby WITH PASSWORD '$PGPASSWORD';
When facing connections issue with the standby you need to copy secrets of the source cluster.
When facing connections issue with the standby you need to copy secrets of the source cluster.
For example, I want to use variant b, is it possible to specify this in the operator settings, and if so, in which section?
My problem has been resolved by upgrading the package. Thanks everyone for your input.
ps: I should close this ticket sooner. My bad.
I don't know much about GitHub, so I'm not sure if i open this issue but, i had the exact problem with syncing after creation.
this resolved my issue; https://github.com/zalando/postgres-operator/issues/1115#issuecomment-786204248
Is this still the way to do it, manually? Or is there a better way?
Thank you
Hi,
I have stuck in the DB connection problem for a while. I have noticed there has a few issues with common symptom like my case, but none of their solutions works in my case. Can someone please give me some directions?
I have successfully installed the postgres-operator thru helm, created my cluster and run my cluster successfully. But I cannot connect to the DB even within Kubernetes cluster, so I ssh to the DB pod and connect the DB by
kubectl exec -it pod/test-testdb-cluster-0 -- bash
andpsql -U postgres
. However, I don't see the role and DB I specified in my yaml(testdbname and testdbowner).Log from the cluster:
My yaml:
Helm3 installation cmd:
I have tried the 1.4.0, 1.5.0 and master branch/tag. I have also tried to install helm without values-crd.yaml, but none of them works. Any ideas?
Thanks a lot.