zalando / postgres-operator

Postgres operator creates and manages PostgreSQL clusters running in Kubernetes
https://postgres-operator.readthedocs.io/
MIT License
4.2k stars 965 forks source link

Running as non root on VMware Tanzu #1843

Open omlet05 opened 2 years ago

omlet05 commented 2 years ago

Hey,

We tried to deploy the operator using this in the manifest to allow pod to run as non root: kubernetes_use_configmaps: "false" spilo_allow_privilege_escalation: "false" spilo_runasuser: 101 spilo_runasgroup: 103 spilo_fsgroup: 103 spilo_privileged: "false"

pod are now starting but we have issue like this: runsv pgqd: fatal: unable to start ./run: access denied and this is not working as expected.

As a workaround we deployed custom images with this: `RUN chown -R postgres:postgres /bin

/etc/hosts and /etc/resolv.conf cannot be chowned

RUN chown -R postgres:postgres /etc

RUN chown -R postgres:postgres /lib RUN chown -R postgres:postgres /run RUN chown -R postgres:postgres /sbin RUN chown -R postgres:postgres /usr RUN chown -R postgres:postgres /var RUN chown postgres:postgres /launch.sh RUN sed -i '10 a rm /etc/supervisor/conf.d/cron.conf' /launch.sh

USER 101 `

But this is clearly not the way to do it...

Images used are lastest released.

Do you have any idea on what can be the solution with official images?

CyberDem0n commented 2 years ago

Do you have any idea on what can be the solution with official images?

Hmm, but you didn't even say which image you use... Anyway, please try registry.opensource.zalan.do/acid/spilo-cdp-14:2.1-p217

omlet05 commented 2 years ago

Dear @CyberDem0n,

Sorry about that, we're using registry.opensource.zalan.do/acid/spilo-cdp-14:2.1-p4

I'll try 2.1-p127 on Monday and let you know.

Thank you! Regards,

omlet05 commented 2 years ago

Dear @CyberDem0n,

Still not working...

/launch.sh: 24: /launch.sh: cannot create /etc/passwd: Permission denied
2022-04-11 07:57:47,851 - bootstrapping - INFO - Figuring out my environment (Google? AWS? Openstack? Local?)
2022-04-11 07:57:49,858 - bootstrapping - INFO - Could not connect to 169.254.169.254, assuming local Docker setup
2022-04-11 07:57:49,859 - bootstrapping - INFO - No meta-data available for this provider
2022-04-11 07:57:49,862 - bootstrapping - INFO - Looks like your running local
2022-04-11 07:57:49,899 - bootstrapping - INFO - Configuring crontab
2022-04-11 07:57:49,899 - bootstrapping - INFO - Skipping creation of renice cron job due to lack of SYS_NICE capability
2022-04-11 07:57:49,900 - bootstrapping - INFO - Configuring certificate
2022-04-11 07:57:49,900 - bootstrapping - INFO - Generating ssl self-signed certificate
2022-04-11 07:57:49,975 - bootstrapping - INFO - Configuring pgqd
2022-04-11 07:57:49,976 - bootstrapping - INFO - Configuring pgbouncer
2022-04-11 07:57:49,976 - bootstrapping - INFO - No PGBOUNCER_CONFIGURATION was specified, skipping
2022-04-11 07:57:49,976 - bootstrapping - INFO - Configuring wal-e
2022-04-11 07:57:49,976 - bootstrapping - INFO - Configuring pam-oauth2
2022-04-11 07:57:49,977 - bootstrapping - INFO - Writing to file /etc/pam.d/postgresql
2022-04-11 07:57:49,977 - bootstrapping - INFO - Configuring bootstrap
2022-04-11 07:57:49,977 - bootstrapping - INFO - Configuring patroni
2022-04-11 07:57:49,984 - bootstrapping - INFO - Writing to file /run/postgres.yml
2022-04-11 07:57:49,984 - bootstrapping - INFO - Configuring log
2022-04-11 07:57:49,984 - bootstrapping - INFO - Configuring standby-cluster
2022-04-11 07:57:50,231 WARNING: Kubernetes RBAC doesn't allow GET access to the 'kubernetes' endpoint in the 'default' namespace. Disabling 'bypass_api_service'.
2022-04-11 07:57:51,253 ERROR: ObjectCache.run ApiException()
2022-04-11 07:57:52,258 ERROR: ObjectCache.run ApiException()
2022-04-11 07:57:53,263 ERROR: ObjectCache.run ApiException()
2022-04-11 07:57:54,267 ERROR: ObjectCache.run ApiException()
2022-04-11 07:57:55,272 ERROR: ObjectCache.run ApiException()
2022-04-11 07:57:56,277 ERROR: ObjectCache.run ApiException()
2022-04-11 07:57:57,284 ERROR: ObjectCache.run ApiException()
2022-04-11 07:57:58,290 ERROR: ObjectCache.run ApiException()
2022-04-11 07:57:59,295 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:00,236 ERROR: get_cluster
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 704, in _load_cluster
    self._wait_caches(stop_time)
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 696, in _wait_caches
    raise RetryFailedError('Exceeded retry deadline')
patroni.utils.RetryFailedError: 'Exceeded retry deadline'
2022-04-11 07:58:00,237 WARNING: Can not get cluster from dcs
2022-04-11 07:58:00,302 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:01,307 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:02,313 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:03,318 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:04,323 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:05,327 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:06,332 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:07,337 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:08,345 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:09,350 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:10,356 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:11,362 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:12,368 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:13,373 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:14,378 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:15,243 ERROR: get_cluster
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 704, in _load_cluster
    self._wait_caches(stop_time)
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 696, in _wait_caches
    raise RetryFailedError('Exceeded retry deadline')
patroni.utils.RetryFailedError: 'Exceeded retry deadline'
2022-04-11 07:58:15,244 WARNING: Can not get cluster from dcs
2022-04-11 07:58:15,385 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:16,391 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:17,397 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:18,405 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:19,411 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:20,417 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:21,421 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:22,425 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:23,431 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:24,435 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:25,446 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:26,455 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:27,463 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:28,469 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:29,474 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:30,247 ERROR: get_cluster
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 704, in _load_cluster
    self._wait_caches(stop_time)
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 696, in _wait_caches
    raise RetryFailedError('Exceeded retry deadline')
patroni.utils.RetryFailedError: 'Exceeded retry deadline'
2022-04-11 07:58:30,248 WARNING: Can not get cluster from dcs
2022-04-11 07:58:30,479 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:31,487 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:32,492 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:33,497 ERROR: ObjectCache.run ApiException()
2022-04-11 07:58:34,502 ERROR: ObjectCache.run ApiException()

Regards,

CyberDem0n commented 2 years ago

Patroni can't work with K8s API. Probably something is wrong with ServiceAccount or Role/ClusterRole. It is not really related to file permissions in spilo.

omlet05 commented 2 years ago

Dear @CyberDem0n,

My bad, we're using: spilo-14:2.1-p4 and not the cdp version...

imranrazakhan commented 1 year ago

@omlet05

The cause of this error is that the default environment variable value for PATRONI_KUBERNETES_NAMESPACE is set to "default," while the TSDB is usually deployed in a different namespace.

I changed from

        - name: PATRONI_KUBERNETES_NAMESPACE
          value: default

To

    - name: PATRONI_KUBERNETES_NAMESPACE
      valueFrom:
        fieldRef:
          fieldPath: metadata.namespace