zalando / postgres-operator

Postgres operator creates and manages PostgreSQL clusters running in Kubernetes
https://postgres-operator.readthedocs.io/
MIT License
4.12k stars 948 forks source link

Providing Patroni with a custom CA bundle #1877

Open khmarochos opened 2 years ago

khmarochos commented 2 years ago

Hello,

Is that possible to provide Patroni with a custom CA bundle for interacting with Kubernetes API Server?

By default it uses /var/run/secrets/kubernetes.io/serviceaccount/ca.crt which is set by Kuberneres (it consists the data from the Secret referred by the main ServiceAccount). My problem is that my Kuberneres' certificate is signed by an intermediate certificate which is signed by a self-signed root certificate. Alas, Patroni (or, to be exact, the SSL library that is being used by Patroni) couldn't establish the TLS-session with the Kubernetes API Server, because /var/run/secrets/kubernetes.io/serviceaccount/ca.crt consists only a "shortened" version of my CA chain (only the intermediate CA certificate is included there, the root one is omitted). That seems to be enough for Kubernetes itself, but Patroni (or the SSL library, to be exact) needs to check the full chain.

Is there a way to ask Patroni to use the full version of my CA chain when it's interacting with the Kubernetes API Server? Of course, I know that that's possible to set kubernetes.cacert in the Patroni's configuration file (according to this manual: https://github.com/zalando/patroni/blob/master/docs/SETTINGS.rst#kubernetes) to make it using any file instead of the ca.crt provided by Kuberneres, but I can't find the way to accomplish that by Postgres Operator.

Would anyone be so kind as to help me to find a way to do that?

Thanks in advance!

dmvolod commented 2 years ago

Yes, seems it's possible. You should define and set a PATRONI_KUBERNETES_CACERT env variable as described in the following documentation https://patroni.readthedocs.io/en/latest/ENVIRONMENT.html Please note, that customization of the cluster environment variables are possible starting operator v1.8.0

khmarochos commented 2 years ago

Thank you for the hint!

It might be that I do something wrong, but I have to admit that the environment variables don't change anything.

Here's a snippet from the file containing the configuration values for Helm:

spec:                                                                                                                                                                                         env:                                                                                                                                                                                          
  env:
    - name: PATRONI_KUBERNETES_CACERT
      value: /tls/ca.crt
    - name: PATRONI_LOG_LEVEL
      value: DEBUG
    - name: PATRONI_LOG_TRACEBACK_LEVEL
      value: DEBUG

As I can see, they are applied, here's what I see in the description of the 1st pod:

Containers:
  postgres:
    Environment:
      PATRONI_KUBERNETES_CACERT:    /tls/ca.crt
      PATRONI_LOG_LEVEL:            DEBUG
      PATRONI_LOG_TRACEBACK_LEVEL:  DEBUG

But at the same time, I still get the same error messages in the output of Patroni:

2022-05-12 00:48:19,237 WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=0, status=None)) after connection broken by 'SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)'),)': /api/v1/namespaces/default/endpoints/kubernetes
2022-05-12 00:48:19,248 ERROR: Failed to get "kubernetes" endpoint from https://10.233.0.1:443: MaxRetryError("HTTPSConnectionPool(host='10.233.0.1', port=443): Max retries exceeded with url: /api/v1/namespaces/default/endpoints/kubernetes (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)'),))",)

At the same time, /tls/ca.crt contains the full CA certification chain. I'm totally sure that the environment variable is ignored, because everything works fine when I edit /home/postgres/postgres.yml (adding the kubernetes.cacert parameter contatning /tls/ca.crt) and restart Patroni.

By the way PATRONI_LOG_LEVEL seems to be ignored too, because I don't see any messages lower than the INFO level in the Patroni's output.

That's also pretty odd that the other environment variables of this pod doesn't contain the PATRONI_ prefix in their names. Here's the full list of the environment variables that I see in the pod's description:

Containers:
  postgres:
    Environment:
      SCOPE:                        acid-kloudster-backend
      PGROOT:                       /home/postgres/pgdata/pgroot
      POD_IP:                        (v1:status.podIP)
      POD_NAMESPACE:                postgresql (v1:metadata.namespace)
      PGUSER_SUPERUSER:             postgres
      KUBERNETES_SCOPE_LABEL:       cluster-name
      KUBERNETES_ROLE_LABEL:        spilo-role
      PGPASSWORD_SUPERUSER:         <set to the key 'password' in secret 'postgres.acid-kloudster-backend.credentials.postgresql.acid.zalan.do'>  Optional: false
      PGUSER_STANDBY:               standby
      PGPASSWORD_STANDBY:           <set to the key 'password' in secret 'standby.acid-kloudster-backend.credentials.postgresql.acid.zalan.do'>  Optional: false
      PAM_OAUTH2:                   https://info.example.com/oauth2/tokeninfo?access_token= uid realm=/employees
      HUMAN_ROLE:                   zalandos
      PGVERSION:                    14
      KUBERNETES_LABELS:            {"application":"spilo"}
      SPILO_CONFIGURATION:          {"postgresql":{},"bootstrap":{"initdb":[{"auth-host":"md5"},{"auth-local":"trust"}],"users":{"zalandos":{"password":"","options":["CREATEDB","NOLOGIN"]}},"dcs":{}}}
      DCS_ENABLE_KUBERNETES_API:    true
      PATRONI_KUBERNETES_CACERT:    /tls/ca.crt
      PATRONI_LOG_LEVEL:            DEBUG
      PATRONI_LOG_TRACEBACK_LEVEL:  DEBUG
      SSL_CERTIFICATE_FILE:         /tls/tls.crt
      SSL_PRIVATE_KEY_FILE:         /tls/tls.key
      SSL_CA_FILE:                  /tls/ca.crt

Why does not these variables' names contain the PATRONI_ prefix?

Thanks.

dmvolod commented 2 years ago

Sorry, my bad, Zalando operator don't run Partoni directly and utilizes Spilo image for it Please have a look at the following documentation to choose correct env variable for the certificate https://github.com/zalando/spilo/blob/master/ENVIRONMENT.rst

khmarochos commented 2 years ago

Alas, SSL_CA_FILE is set, but it doesn't affect interactions with Kubernetes API Server.

I will open a similar issue for zalando/spilo, but if anyone have any other ideas, please, share them.

Thanks!

dmvolod commented 2 years ago

For this variable propagation, you should set tls.caFile and tls.CASecretName options on the Postgresql CR


# Custom TLS certificate. Disabled unless tls.secretName has a value.
  tls:
    secretName: ""  # should correspond to a Kubernetes Secret resource to load
    certificateFile: "tls.crt"
    privateKeyFile: "tls.key"
    caFile: ""  # optionally configure Postgres with a CA certificate
    caSecretName: "" # optionally the ca.crt can come from this secret instead```
khmarochos commented 2 years ago

Of course, I've set them and I see that the environment variables are properly set, but the point is that they don't affect interactions with Kubernetes API Server at all. They seem to be affecting interactions between Patroni instances only. :-(

dmvolod commented 2 years ago

Oh, yes, true. For this case, some fixes in the Spilo should be implemented to propagate these options to the Patroni

howels commented 1 year ago

Is this resolved via https://github.com/zalando/spilo/releases/tag/2.1-p6 or other recent changes?

khmarochos commented 1 year ago

Is this resolved via https://github.com/zalando/spilo/releases/tag/2.1-p6 or other recent changes?

Haven't tried that yet, because I decided that using a "long" CA bundle for a Kubernetes wasn't a good idea at all. There are way too many issues with other components.

D1StrX commented 1 year ago

@khmarochos Just trying to understand the scope of "many issues with other components", do you mean also other stuff than just Zalando Postgresql? Having the exact same issue here, regarding a custom k8s CA & Kubeapi server

Also, I find it a bit weird behavior. If you take a look at this documentation of Patroni Kubernetes env params:

PATRONI_KUBERNETES_CACERT: (optional) Specifies the file with the CA_BUNDLE file with certificates of trusted CAs to use while verifying Kubernetes API SSL certs. If not provided, patroni will use the value provided by the ServiceAccount secret.

When checking the /var/run/secrets/kubernetes.io/serviceaccount/ca.crt inside the database container, it contains the correct CA. But not the entire chain (which kinda makes sense?). So perhaps when the CA is mountable in /etc/ssl/certs/ it is fixed (if the app looks in that dir)? Tested this, curl command from within the containers shows the chain is OK. But the issue remains, probably because the ~/serviceaccount/ca.crt doesn't contain the chain ...

khmarochos commented 1 year ago

@D1StrX, right, a chained CA causes lots of pain. I had similar problems with the Elastic stack. I also have a feeling that there were something else, though I can't remember what exactly. So, I decided to amandon the idea to use chained CAs for Kubernetes clusters. I've found a couple of links to the questions I raised before:

There is not too much of feedback, so it might be that there are not too many perverts to do that. :-)