sorintlab / stolon

PostgreSQL cloud native High Availability and more.
https://talk.stolon.io
Apache License 2.0
4.63k stars 444 forks source link

FATAL: data directory "/stolon-data/postgres" has invalid permission #809

Closed samene closed 3 years ago

samene commented 3 years ago

What happened: I am using the stolon helm chart from stable repo to deploy to kubernetes 1.18. There is istio injection enabled on the namespace and longhorn is used for storage. When the keeper starts for the first time everything is working fine and the folder permission on /stolon-data and /stolon-data/postgres are all correct. Now, Restart the keeper pod and we see below exceptions in keeper logs and database does not come-up after that

2020-11-11T16:26:21.704Z WARN   cmd/keeper.go:182       password file permissions are too open. This file should only be readable to the user executing stolon! Continuing...   {"file": "/etc/secrets/stolon-db-replica/pg_repl_password", "mode": "01000000777"}
2020-11-11T16:26:21.704Z WARN   cmd/keeper.go:182       password file permissions are too open. This file should only be readable to the user executing stolon! Continuing...   {"file": "/etc/secrets/stolon-db-admin/pg_su_password", "mode": "01000000777"}
2020-11-11T16:26:21.704Z INFO   cmd/keeper.go:2039      exclusive lock on data dir taken
2020-11-11T16:26:21.708Z INFO   cmd/keeper.go:525       keeper uid      {"uid": "keeper0"}
2020-11-11T16:26:21.726Z ERROR  cmd/keeper.go:673       cannot get configured pg parameters     {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
2020-11-11T16:26:21.728Z INFO   cmd/keeper.go:1047      our db boot UID is different than the cluster data one, waiting for it to be updated    {"bootUUID": "fc4df30a-c641-4de8-bbb1-c51765eb2a62", "clusterBootUUID": "bd4d37fe-6ef2-4c8d-9803-d5415f5aadb1"}
2020-11-11T16:26:24.227Z ERROR  cmd/keeper.go:673       cannot get configured pg parameters     {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
2020-11-11T16:26:26.728Z ERROR  cmd/keeper.go:673       cannot get configured pg parameters     {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
2020-11-11T16:26:26.762Z INFO   cmd/keeper.go:1457      our db requested role is master
2020-11-11T16:26:26.784Z INFO   postgresql/postgresql.go:319    starting database
2020-11-11 16:26:26.795 UTC [48] FATAL:  data directory "/stolon-data/postgres" has invalid permissions
2020-11-11 16:26:26.795 UTC [48] DETAIL:  Permissions should be u=rwx (0700) or u=rwx,g=rx (0750).
2020-11-11T16:26:26.984Z ERROR  cmd/keeper.go:1476      failed to start postgres        {"error": "postgres exited unexpectedly"}
2020-11-11T16:26:29.228Z ERROR  cmd/keeper.go:673       cannot get configured pg parameters     {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
2020-11-11T16:26:31.729Z ERROR  cmd/keeper.go:673       cannot get configured pg parameters     {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
2020-11-11T16:26:32.012Z INFO   cmd/keeper.go:1457      our db requested role is master
2020-11-11T16:26:32.030Z INFO   postgresql/postgresql.go:319    starting database

After this logs are full of the permission issue

2020-11-11T16:34:26.593Z INFO   postgresql/postgresql.go:319    starting database
2020-11-11 16:34:26.605 UTC [47] FATAL:  data directory "/stolon-data/postgres" has invalid permissions
2020-11-11 16:34:26.605 UTC [47] DETAIL:  Permissions should be u=rwx (0700) or u=rwx,g=rx (0750).
2020-11-11T16:34:26.794Z ERROR  cmd/keeper.go:1476      failed to start postgres        {"error": "postgres exited unexpectedly"}
2020-11-11T16:34:29.041Z ERROR  cmd/keeper.go:673       cannot get configured pg parameters     {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
2020-11-11T16:34:31.542Z ERROR  cmd/keeper.go:673       cannot get configured pg parameters     {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
2020-11-11T16:34:31.823Z INFO   cmd/keeper.go:1457      our db requested role is master

This is the correct permission before

drwx--S---. 19 stolon stolon 4096 Nov 11 15:35 postgres

And this is afterwards

drwxrws---. 19 stolon 1337  4096 Nov 11 16:36 postgres

What you expected to happen: When keeper pod restarts the database should start properly

How to reproduce it (as minimally and precisely as possible): As described above

Anything else we need to know?:

Environment:

sgotti commented 3 years ago

@samene I think you should investigate the storage system you're using (longhorn), if every keeper has persistent volumes (the same volume after pod restart) or the stolon chart since stolon is not changing any filesystem permissions by itself.