Open xhejtman opened 2 years ago
We occasionally face invalid permissions issues:
2022-09-12 07:09:55,610 INFO: doing crash recovery in a single user mode
2022-09-12 07:09:55,631 ERROR: Crash recovery finished with code=1
2022-09-12 07:09:55,631 INFO: stdout=
2022-09-12 07:09:55,631 INFO: stderr=2022-09-12 07:09:55 UTC [31680]: [1-1] 631edb43.7bc0 0 FATAL: data directory "/home/postgres/pgdata/pgroot/data" has invalid permissions
2022-09-12 07:09:55 UTC [31680]: [2-1] 631edb43.7bc0 0 DETAIL: Permissions should be u=rwx (0700) or u=rwx,g=rx (0750).
2022-09-12 07:09:56.363 36 LOG {ticks: 0, maint: 0, retry: 0}
Unfortunately, we currently do not know how to reproduce the error. However, a GitHub issue for the Crunchy Postgres Operator suggests that setting fsGroupChangePolicy
to OnRootMismatch
fixes it. Unfortunately, there seems to be no way to adjust the security context configuration with the Zalando Postgres Operator.
Looks like we were hit by issue #1703: I/O performance issues caused the Kubernetes control plane to restart, triggering this issue. In terms of resilience, it would be helpful to configure fsGroupChangePolicy
for the database StatefulSet
.
/assign
@stephan2012 How do you get around this issue, when it comes up? I had to recover my Postgres cluster, but when I did I got the mentioned Permission issue. This did not happen before, when I had to recover the Postgres cluster. Now I can't get it to start. Restart doesn't seem to help.
@stephan2012 How do you get around this issue, when it comes up?
You can manually fix the directory permissions by shelling into the Pod and running chmod
. Look at the container logs to see what Patroni expects. Unfortunately, my PR is still waiting for a response from the maintainers.
@stephan2012 How do you get around this issue, when it comes up?
You can manually fix the directory permissions by shelling into the Pod and running
chmod
. Look at the container logs to see what Patroni expects. Unfortunately, my PR is still waiting for a response from the maintainers.
It was tricky for me because it renamed the data directory after it failed to start. I had to get the exact moment where it created the data directory and had to modify the permissions on it very fast or else it failed, renamed the directory and started bootstrap again. But thank you very much, this was the solution just a bit more tricky in my case.
Is there any date for someone to have a look at this issue ? thanks
Is there any date for someone to have a look at this issue ? thanks
Looks like the maintainers are interested in something other than my PR.
Looks like the maintainers are interested in something other than my PR.
For more than two years 😓
Please, answer some short questions which should help us to understand your problem / question better?
Would it be possible to add fsGroupChangePolicy option to the security context of created postgres statefulset?
Sometimes, kubelet changes access rights of data so that fsGroup GID can read/write data which postgres dislikes as security issue. This can be avoided by setting fsGroupChangePolicy: OnRootMismatch, so recursive chmod does not happen. This seems to be not possible currently so adding it as an option would be appreciated.