zalando / postgres-operator

Postgres operator creates and manages PostgreSQL clusters running in Kubernetes
https://postgres-operator.readthedocs.io/
MIT License
4.24k stars 968 forks source link

[OpenShift] fails to form cluster, seeing: PermissionError: [Errno 1] Operation not permitted: '/run/postgres.yml' #1327

Open davidkarlsen opened 3 years ago

davidkarlsen commented 3 years ago

Please, answer some short questions which should help us to understand your problem / question better?

creating a cluster, I see:

anchore-cluster-1 postgres 2021-01-22 22:40:34,976 - bootstrapping - INFO - Figuring out my environment (Google? AWS? Openstack? Local?)
anchore-cluster-1 postgres 2021-01-22 22:40:35,980 - bootstrapping - INFO - Could not connect to 169.254.169.254, assuming local Docker setup
anchore-cluster-1 postgres 2021-01-22 22:40:35,982 - bootstrapping - INFO - No meta-data available for this provider
anchore-cluster-1 postgres 2021-01-22 22:40:35,982 - bootstrapping - INFO - Looks like your running local
anchore-cluster-1 postgres 2021-01-22 22:40:36,017 - bootstrapping - INFO - Configuring pgqd
anchore-cluster-1 postgres 2021-01-22 22:40:36,017 - bootstrapping - INFO - Configuring crontab
anchore-cluster-1 postgres 2021-01-22 22:40:36,018 - bootstrapping - INFO - Skipping creation of renice cron job due to lack of SYS_NICE capability
anchore-cluster-1 postgres 2021-01-22 22:40:36,018 - bootstrapping - INFO - Configuring log
anchore-cluster-1 postgres 2021-01-22 22:40:36,018 - bootstrapping - INFO - Configuring patroni
anchore-cluster-1 postgres 2021-01-22 22:40:36,026 - bootstrapping - INFO - Writing to file /run/postgres.yml
anchore-cluster-1 postgres Traceback (most recent call last):
anchore-cluster-1 postgres   File "/scripts/configure_spilo.py", line 1012, in <module>
anchore-cluster-1 postgres     main()
anchore-cluster-1 postgres   File "/scripts/configure_spilo.py", line 943, in main
anchore-cluster-1 postgres     adjust_owner(placeholders, PATRONI_CONFIG_FILE, gid=-1)
anchore-cluster-1 postgres   File "/scripts/configure_spilo.py", line 66, in adjust_owner
anchore-cluster-1 postgres     os.chown(resource, uid, gid)
anchore-cluster-1 postgres PermissionError: [Errno 1] Operation not permitted: '/run/postgres.yml'

in the logs on startup and it fails to form a cluster.

the CR is:

apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
  namespace: anchore
  name: anchore-cluster
spec:
  teamId: "anchore"
  volume:
    storageClass: openebs-local
    size: 10Gi
  numberOfInstances: 2
  users:
    # database owner
    anchore:
    - superuser
    - createdb

    # role for application foo
    anchore_user: []

  #databases: name->owner
  databases:
    anchore: anchore
  postgresql:
    version: "13"

If I provide anyuid (https://docs.openshift.com/container-platform/4.6/authentication/managing-security-context-constraints.html) rolebinding:

k create rolebinding privileged-postgres-pod --clusterrole=system:openshift:scc:anyuid  --serviceaccount=anchore:postgres-pod -n anchore

it will go further, but still fail:

HTTP response headers: HTTPHeaderDict({'Audit-Id': '278593bb-be52-4504-b372-df34e34afff7', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': '31572b30-d659-4d33-a03a-d134fc14333b', 'X-Kubernetes-Pf-Prioritylevel-Uid': '7950f657-f3e9-4ac1-a2d4-6dd17200d796', 'Date': 'Fri, 22 Jan 2021 23:19:09 GMT', 'Content-Length': '259'})
HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"endpoints \\"anchore-cluster\\" is forbidden: endpoint address 10.200.11.132 is not allowed","reason":"Forbidden","details":{"name":"anchore-cluster","kind":"endpoints"},"code":403}\n'

2021-01-22 23:19:12,724 ERROR: failed to update leader lock
2021-01-22 23:19:12,724 INFO: not promoting because failed to update leader lock in DCS
2021-01-22 23:19:22,678 INFO: Lock owner: anchore-cluster-0; I am anchore-cluster-0
2021-01-22 23:19:22,724 ERROR: Permission denied
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 897, in _update_leader_with_retry
    return self._patch_or_create(self.leader_path, annotations, resource_version, ips=ips, retry=_retry)
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 854, in _patch_or_create
    ret = retry(func, self._namespace, body) if retry else func(self._namespace, body)
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 894, in _retry
    return retry(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/patroni/utils.py", line 333, in __call__
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 466, in wrapper
    return getattr(self._core_v1_api, func)(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 402, in wrapper
    return self._api_client.call_api(method, path, headers, body, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 371, in call_api
    return self._handle_server_response(response, _preload_content)
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 201, in _handle_server_response
    raise k8s_client.rest.ApiException(http_resp=response)
patroni.dcs.kubernetes.K8sClient.rest.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'f8ec045d-c2cd-4455-b140-f772849df4c6', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': '31572b30-d659-4d33-a03a-d134fc14333b', 'X-Kubernetes-Pf-Prioritylevel-Uid': '7950f657-f3e9-4ac1-a2d4-6dd17200d796', 'Date': 'Fri, 22 Jan 2021 23:19:19 GMT', 'Content-Length': '259'})
HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"endpoints \\"anchore-cluster\\" is forbidden: endpoint address 10.200.11.132 is not allowed","reason":"Forbidden","details":{"name":"anchore-cluster","kind":"endpoints"},"code":403}\n'

2021-01-22 23:19:22,724 ERROR: failed to update leader lock
2021-01-22 23:19:22,724 INFO: not promoting because failed to update leader lock in DCS
zetaab commented 3 years ago

I have similar in k8s 1.20 (root pods not allowed to run)

% kubectl logs kaas-postgres-0
2021-01-25 09:41:46,618 - bootstrapping - INFO - Figuring out my environment (Google? AWS? Openstack? Local?)
2021-01-25 09:41:48,624 - bootstrapping - INFO - Could not connect to 169.254.169.254, assuming local Docker setup
2021-01-25 09:41:48,625 - bootstrapping - INFO - No meta-data available for this provider
2021-01-25 09:41:48,625 - bootstrapping - INFO - Looks like your running local
2021-01-25 09:41:48,651 - bootstrapping - INFO - Configuring pgqd
2021-01-25 09:41:48,651 - bootstrapping - INFO - Configuring bootstrap
2021-01-25 09:41:48,651 - bootstrapping - INFO - Configuring log
2021-01-25 09:41:48,651 - bootstrapping - INFO - Configuring standby-cluster
2021-01-25 09:41:48,651 - bootstrapping - INFO - Configuring pgbouncer
2021-01-25 09:41:48,651 - bootstrapping - INFO - No PGBOUNCER_CONFIGURATION was specified, skipping
2021-01-25 09:41:48,651 - bootstrapping - INFO - Configuring certificate
2021-01-25 09:41:48,652 - bootstrapping - INFO - Generating ssl certificate
Traceback (most recent call last):
  File "/scripts/configure_spilo.py", line 1012, in <module>
    main()
  File "/scripts/configure_spilo.py", line 980, in main
    write_certificates(placeholders, args['force'])
  File "/scripts/configure_spilo.py", line 113, in write_certificates
    adjust_owner(environment, environment['SSL_PRIVATE_KEY_FILE'], gid=-1)
  File "/scripts/configure_spilo.py", line 66, in adjust_owner
    os.chown(resource, uid, gid)
PermissionError: [Errno 1] Operation not permitted: '/run/certs/server.key'
zetaab commented 3 years ago

tried following settings for operator:

  # set user and group for the spilo container (required to run Spilo as non-root process)
  spilo_runasuser: 1001
  spilo_runasgroup: 1003
  # group ID with write-access to volumes (required to run Spilo as non-root process)
  spilo_fsgroup: 1003

but then the new cluster:

% kubectl logs kaas-postgres-0 -f
mkdir: cannot create directory ‘/run/tmp’: Permission denied
mkdir: cannot create directory ‘/run/certs’: Permission denied
/launch.sh: 23: /launch.sh: cannot create /run/tmp/passwd: Directory nonexistent
/launch.sh: 24: /launch.sh: cannot create /etc/passwd: Permission denied
rm: cannot remove '/run/tmp/passwd': No such file or directory
chown: changing ownership of '/home/postgres/pgdata/pgroot/data': Operation not permitted
chown: changing ownership of '/home/postgres/pgdata/pgroot/pg_log/postgresql-6.csv': Operation not permitted
chown: changing ownership of '/home/postgres/pgdata/pgroot/pg_log/postgresql-5.csv': Operation not permitted
chown: changing ownership of '/home/postgres/pgdata/pgroot/pg_log/postgresql-4.csv': Operation not permitted
chown: changing ownership of '/home/postgres/pgdata/pgroot/pg_log/postgresql-7.csv': Operation not permitted
chown: changing ownership of '/home/postgres/pgdata/pgroot/pg_log/postgresql-3.csv': Operation not permitted
chown: changing ownership of '/home/postgres/pgdata/pgroot/pg_log/postgresql-0.csv': Operation not permitted
chown: changing ownership of '/home/postgres/pgdata/pgroot/pg_log/postgresql-1.csv': Operation not permitted
chown: changing ownership of '/home/postgres/pgdata/pgroot/pg_log/postgresql-2.csv': Operation not permitted
chown: changing ownership of '/home/postgres/pgdata/pgroot/pg_log': Operation not permitted
chown: changing ownership of '/home/postgres/pgdata/pgroot': Operation not permitted
chown: cannot access '/run/certs': No such file or directory
chmod: cannot access '/run/tmp': No such file or directory
2021-01-25 10:30:44,644 - bootstrapping - INFO - Figuring out my environment (Google? AWS? Openstack? Local?)
2021-01-25 10:30:46,649 - bootstrapping - INFO - Could not connect to 169.254.169.254, assuming local Docker setup
2021-01-25 10:30:46,650 - bootstrapping - INFO - No meta-data available for this provider
2021-01-25 10:30:46,650 - bootstrapping - INFO - Looks like your running local
2021-01-25 10:30:46,675 - bootstrapping - INFO - Configuring pgqd
Traceback (most recent call last):
  File "/scripts/configure_spilo.py", line 1012, in <module>
    main()
  File "/scripts/configure_spilo.py", line 972, in main
    link_runit_service(placeholders, 'pgqd')
  File "/scripts/configure_spilo.py", line 72, in link_runit_service
    os.makedirs(service_dir)
  File "/usr/lib/python3.6/os.py", line 210, in makedirs
    makedirs(head, mode, exist_ok)
  File "/usr/lib/python3.6/os.py", line 220, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/run/service'
zetaab commented 3 years ago

got it working by following crd:

apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
  name: kaas-postgres
spec:
  dockerImage: registry.opensource.zalan.do/acid/spilo-12:1.6-p3
  teamId: "kaas"
  spiloRunAsUser: 101
  spiloRunAsGroup: 103
  spiloFSGroup: 103
  numberOfInstances: 3
  enableMasterLoadBalancer: false
  enableLogicalBackup: true
  logicalBackupSchedule: "00 05 * * *"
  enableReplicaLoadBalancer: false
  patroni:
    pg_hba:
    - hostssl all all 0.0.0.0/0 md5
    - host    all all 0.0.0.0/0 md5
  postgresql:
    version: "12"
    parameters:
      shared_buffers: "32MB"
      max_connections: "100"
      log_statement: "all"
  volume:
    size: 8Gi
  resources:
    limits:
      cpu: 800m
      memory: 800Mi
    requests:
      cpu: 400m
      memory: 400Mi