zalando / postgres-operator

Postgres operator creates and manages PostgreSQL clusters running in Kubernetes
https://postgres-operator.readthedocs.io/
MIT License
4.35k stars 980 forks source link

fs_group 103 fails on pod recreate #821

Closed cazter closed 6 months ago

cazter commented 4 years ago

While a new cluster will initialize, any existing cluster or rather any event that causes the pods to recreate causes the following permissions issue preventing the cluster from initializing:

2020-02-07T19:33:14.507810141Z 2020-02-07 19:33:14 UTC [1023]: [1-1] 5e3dbb7a.3ff 0 FATAL: data directory "/home/postgres/pgdata/pgroot/data" has invalid permissions 2020-02-07T19:33:14.50784768Z 2020-02-07 19:33:14 UTC [1023]: [2-1] 5e3dbb7a.3ff 0 DETAIL: Permissions should be u=rwx (0700) or u=rwx,g=rx (0750).

Additionally, you're unable to access a shell via kubectl as the pod is running outside the root user.

$ kubectl -n pg exec -it bash "root" execution of the PostgreSQL server is not permitted. The server must be started under an unprivileged user ID to prevent possible system security compromise. See the documentation for more information on how to properly start the server. command terminated with exit code 1

zimbatm commented 4 years ago

I think this is a limitation of StatefulSets that don't transition well when adding a fsGroup after they have been created. I don't remember where I have seen that unfortunately.

abh commented 4 years ago

@zimbatm I have a cluster where this happens each time the operator restarts the pods. I have deleted the statefulset to let the operator create it anew, and it still happens.

Samusername commented 4 years ago

A contact noticed a workaround to the problem: https://github.com/zalando/patroni/commit/7e170928093f31da7b64086d515108f1fd7efab1 It seems to help: After restarting pods, patronictl shows db nodes in normal state. On the linked page side note: "This error does not occur when using shared filesystems (like NFS)"

ReSearchITEng commented 4 years ago

Hi @zimbatm and all, as @Samusername specified, this new spilo image which does chmod accordingly at startup would solve this issue: registry.opensource.zalan.do/acid/spilo-cdp-12:1.6-p114 # or newer

To get the latest versions of the images of both operator and spilo, do: https://registry.opensource.zalan.do/v2/acid/postgres-operator/tags/list https://registry.opensource.zalan.do/v2/acid/spilo-cdp-12/tags/list https://registry.opensource.zalan.do/v2/acid/spilo-12/tags/list #The usual release branch