zalando / postgres-operator

Postgres operator creates and manages PostgreSQL clusters running in Kubernetes
https://postgres-operator.readthedocs.io/
MIT License
4.31k stars 979 forks source link

`SyncFailed` while initializing cluster: `could not create cluster: could not sync prepared databases: could not execute create database schema` #1585

Closed iyesin closed 3 years ago

iyesin commented 3 years ago

Situation

After running installation snippet with attached postgres-cluster spec I was expecting fully-functional PostgreSQL cluster. Unfortunately, my experience was opposite. PostgresSQL cluster stuck in SyncFailed status. Detailed kubectl describe postgresqls huh-test didn't reveal any thing meaningful to me. Last messages was:

  Normal   StatefulSet  45m   postgres-operator  Statefulset "app/huh-test" has been successfully created
  Normal   StatefulSet  44m   postgres-operator  Pods are ready
  Warning  Create       44m   postgres-operator  could not create cluster: could not sync prepared databases: could not execute create database schema: pq: permission denied for database tdb
  Warning  Sync         15m   postgres-operator  could not sync cluster: could not sync prepared database: could not execute create database schema: pq: permission denied for database tdb

See full output in postres-operator.state.log.

While going through postgres-operator logs I found same error. I tried to exclude all kinds of authentication errors and leaved All from Everywhere to Everything type of rules in my pg_hba.conf. It looks like this does not work in this exact case. You can see this in my config.

The question

Is it something wrong with configuration and such behavior is expected (then I would like to know what exactly I'm doing wrong) OR this is sort of bug (then workaround would be highly appreciated).

Installation snippet

#!/usr/bin/env bash

set -e
git clone https://github.com/zalando/postgres-operator.git || true
cd postgres-operator
git clean -x -d -f
git checkout .
git fetch
git checkout v1.6.3
grep --dereference-recursive --files-with-matches --null --fixed-strings 'cluster.local' ./charts \
  | xargs -0 -L1 -- sed -i.orig -e 's/cluster\.local/my-cluster/g'
helm --namespace app install --atomic --replace postgres-operator ./charts/postgres-operator
helm --namespace app install --atomic --replace postgres-operator-ui ./charts/postgres-operator-ui
kubectl --namespace app create -f create-test-cluster.yaml

PostgreSQL cluster spec

create-test-cluster.yaml.gz

Detailed logs from all pods

postres-operator.log huh-test-0.log huh-test-1.log huh-test-2.log

FxKu commented 3 years ago

Can you check the Postgres log in the container why it's failing to create the database schema? Should be executed by postgres user, so it's strange why that fails.

iyesin commented 3 years ago

@FxKu all logs are attached. I wasn't able to find an answer there. That's the reason I created this ticket.

FxKu commented 3 years ago

You attached operator and patroni (pod) logs. I mean the Postgres (database server) logs inside the container.

iyesin commented 3 years ago

@FxKu sorry, it was unknown to me that pg instance inside spilo container is dumping logs to pgdata dir. huh-test-0.tar.gz huh-test-1.tar.gz huh-test-2.tar.gz

FxKu commented 3 years ago

The permission error happens because tdb_owner is not the actual owner of the database hence cannot create schemas there. In your manifest you have also listed tdb under databases section, with puser as the owner. Atm it is not checked for preparedDatabases that there's another owner. Would also be a bit counter-intuitive when there's an owner role.

You can remove the databases section and the owner should be altered. The the schema creation will also work and the snyc should be successful.

iyesin commented 3 years ago

@FxKu , thank you! Both solutions helped on their own:

laiminhtrung1997 commented 5 days ago

How can I configure the owner difference with _owner? @FxKu