zalando / postgres-operator

Postgres operator creates and manages PostgreSQL clusters running in Kubernetes
https://postgres-operator.readthedocs.io/
MIT License
4.22k stars 968 forks source link

Patroni fails to start pods that have Istio Injection Enabled #1642

Open upodroid opened 2 years ago

upodroid commented 2 years ago

Please, answer some short questions which should help us to understand your problem / question better?

REDACTED  MCW0CDP3YY  ~  ERROR  $  k logs acid-prod-0
2021-10-11 22:26:09,834 - bootstrapping - INFO - Figuring out my environment (Google? AWS? Openstack? Local?)
2021-10-11 22:26:11,846 - bootstrapping - INFO - Could not connect to 169.254.169.254, assuming local Docker setup
2021-10-11 22:26:11,848 - bootstrapping - INFO - No meta-data available for this provider
2021-10-11 22:26:11,849 - bootstrapping - INFO - Looks like your running local
2021-10-11 22:26:11,921 - bootstrapping - INFO - Configuring pam-oauth2
2021-10-11 22:26:11,922 - bootstrapping - INFO - Writing to file /etc/pam.d/postgresql
2021-10-11 22:26:11,923 - bootstrapping - INFO - Configuring certificate
2021-10-11 22:26:11,923 - bootstrapping - INFO - Generating ssl certificate
2021-10-11 22:26:12,628 - bootstrapping - INFO - Configuring pgbouncer
2021-10-11 22:26:12,629 - bootstrapping - INFO - No PGBOUNCER_CONFIGURATION was specified, skipping
2021-10-11 22:26:12,629 - bootstrapping - INFO - Configuring bootstrap
2021-10-11 22:26:12,629 - bootstrapping - INFO - Configuring standby-cluster
2021-10-11 22:26:12,629 - bootstrapping - INFO - Configuring log
2021-10-11 22:26:12,629 - bootstrapping - INFO - Configuring patroni
2021-10-11 22:26:12,644 - bootstrapping - INFO - Writing to file /run/postgres.yml
2021-10-11 22:26:12,646 - bootstrapping - INFO - Configuring wal-e
2021-10-11 22:26:12,646 - bootstrapping - INFO - Configuring crontab
2021-10-11 22:26:12,646 - bootstrapping - INFO - Skipping creation of renice cron job due to lack of SYS_NICE capability
2021-10-11 22:26:12,647 - bootstrapping - INFO - Configuring pgqd
Traceback (most recent call last):
  File "/usr/local/bin/patroni", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/patroni/__init__.py", line 171, in main
    return patroni_main()
  File "/usr/local/lib/python3.6/dist-packages/patroni/__init__.py", line 139, in patroni_main
    abstract_main(Patroni, schema)
  File "/usr/local/lib/python3.6/dist-packages/patroni/daemon.py", line 98, in abstract_main
    controller = cls(config)
  File "/usr/local/lib/python3.6/dist-packages/patroni/__init__.py", line 29, in __init__
    self.dcs = get_dcs(self.config)
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/__init__.py", line 97, in get_dcs
    return item(config[name])
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 625, in __init__
    k8s_config.load_incluster_config(ca_certs=self._ca_certs)
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 73, in load_incluster_config
    with open(SERVICE_TOKEN_FILENAME) as f:
PermissionError: [Errno 13] Permission denied: '/var/run/secrets/kubernetes.io/serviceaccount/token'
/run/service/patroni: finished with code=1 signal=0
/run/service/patroni: sleeping 30 seconds
Traceback (most recent call last):
  File "/usr/local/bin/patroni", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/patroni/__init__.py", line 171, in main
    return patroni_main()
  File "/usr/local/lib/python3.6/dist-packages/patroni/__init__.py", line 139, in patroni_main
    abstract_main(Patroni, schema)
  File "/usr/local/lib/python3.6/dist-packages/patroni/daemon.py", line 98, in abstract_main
    controller = cls(config)
  File "/usr/local/lib/python3.6/dist-packages/patroni/__init__.py", line 29, in __init__
    self.dcs = get_dcs(self.config)
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/__init__.py", line 97, in get_dcs
    return item(config[name])
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 625, in __init__
    k8s_config.load_incluster_config(ca_certs=self._ca_certs)
  File "/usr/local/lib/python3.6/dist-packages/patroni/dcs/kubernetes.py", line 73, in load_incluster_config
    with open(SERVICE_TOKEN_FILENAME) as f:
PermissionError: [Errno 13] Permission denied: '/var/run/secrets/kubernetes.io/serviceaccount/token'
/run/service/patroni: finished with code=1 signal=0
/run/service/patroni: sleeping 60 seconds

When I exec in to the pod, the token is available:

 REDACTED  MCW0CDP3YY  ~  $  k exec -it acid-prod-0  -- bash
groups: cannot find name for group ID 1337

 ____        _ _
/ ___| _ __ (_) | ___
\___ \| '_ \| | |/ _ \
 ___) | |_) | | | (_) |
|____/| .__/|_|_|\___/
      |_|

This container is managed by runit, when stopping/starting services use sv

Examples:

sv stop cron
sv restart patroni

Current status: (sv status /etc/service/*)

finish: /etc/service/patroni: (pid 114) 116s
run: /etc/service/pgqd: (pid 28) 299s
root@acid-prod-0:/home/postgres# ls -alh /var/run/secrets/kubernetes.io/serviceaccount/token
lrwxrwxrwx 1 root 1337 12 Oct 11 22:25 /var/run/secrets/kubernetes.io/serviceaccount/token -> ..data/token

The operator is also injected but it seems to be able to read the token properly.

matrix-root commented 2 years ago

Did you find way to solve that?

ggsood commented 2 years ago

hi @LvLs9 and @FxKu i faced the same issue, i was mounting host path to postgres pod, for which i had given spiloRunAsUser: 0 the below one is also required which helped me fixing this issue. spiloFSGroup: 103

Reference Links https://github.com/zalando/postgres-operator/issues/988#issuecomment-637672031

Sindvero commented 2 years ago

Hi @ggsood @Jan-M,

Do you know if there is a plan to fix this issue?

Because using the fix described can't really work in a production/automate environment (using gitops for example)

Thanks