timescale / helm-charts

Configuration and Documentation to run TimescaleDB in your Kubernetes cluster
Apache License 2.0
261 stars 223 forks source link

Add CA to allow custom Certificates #641

Closed MarkCupitt closed 5 months ago

MarkCupitt commented 6 months ago

We are migrating from a Zalando based Kubernetes Postgres installation to TimescaleDB

We use Teleport to manage all of our cluster, kuybernetes and database access. Teleport requires that we use a custom, teleport generated ssl tls cert, which we have done, but we are getting ssl connection failures as clients cannot verify the cert, meaning we need to add the root ca to postgreSQL.

Zalando had a neat way to do this in the CRDs, one simply added ca.crt as an additional field on the cert secret, and it happened automatically.

I have been looking into this in depth, and see that Patroni has this option

  ssl: on
  ssl_cert_file: /home/postgres/cert.pem
  ssl_key_file: /home/postgres/key.pem

but no ability to specify the ca mount in this config

There is also this issue which notes that Patroni is using serviceaccount token and ca.crt from /var/run/secrets/kubernetes.io/serviceaccount/

https://github.com/zalando/patroni/issues/1758

AND

I see from here https://patroni.readthedocs.io/en/latest/ENVIRONMENT.html#kubernetes that:

PATRONI_KUBERNETES_CACERT: (optional) Specifies the file with the CA_BUNDLE file with certificates of trusted CAs to use while verifying Kubernetes API SSL certs. If not provided, patroni will use the value provided by the ServiceAccount secret.

SO

My thought is to use that last notion as an env var, no worries, BUT, I now find there is no way to mount that CA file into the pod, as the Helm chart does not provide for a Volume Mount

Has anyone got any suggestions, unfortunately, we cannot currently use timescaleDB as it's air-gapped behind teleport, a teleport Db SSL connection is the only way we have to get to it .. so I need to get that CA file from Teleport in there somehow. I realize I may be missing something obvious, but so far I cannot see it

Thanks in advance

Mark

MarkCupitt commented 6 months ago

I was able to figure out how to get the CA into Patroni, HOWEVER< it seems to have broken the cluster due to everything talking via SSL, lots of SSL related errors, so more investigation required

# Generate a Teleport Certificate taht it wil be happy with
tctl auth sign --format=db --host=timescaledb.eng.svc.cluster.local --out=server --ttl=1000000h
# Load the cert, with teh Ca into a Secret
kubectl create secret generic eng-db-teleport-tls --namespace eng --from-file=tls.crt=server.crt --from-
file=tls.key=server.key --from-file=ca.crt=server.cas

# The entire secret is already mounted, so its just a matter of telling Patroni where to find teh CA

# SO Add a Custom Env Var that points Patroni to teh rot CA filed in the secret

# Extra custom environment variables.
# These should be an EnvVar, as this allows you to inject secrets into the environment
# https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#envvar-v1-core
env:  - name: PATRONI_KUBERNETES_CACERT
    value: "/etc/certificate/ca.crt"
MarkCupitt commented 5 months ago

OK, so its confirmed its a Patroni Issue

Setting

env:  - name: PATRONI_KUBERNETES_CACERT
    value: "/etc/certificate/ca.crt"

Will break the cluster, as this tells patroni to communicate with Kubernetes for DCS and Kubernetes knows nothing of that CA, pods will not start

I checked a running container's /var/lib/postgresql/data/postgresql.conf and it was missing the entry `ssl_ca_file = '/etc/certificate/ca.crt' despite including it in the

      postgresql:
        parameters:
          ssl: 'on'
          ssl_ca_file: /etc/certificate/ca.crt
          ssl_cert_file: /etc/certificate/tls.crt
          ssl_key_file: /etc/certificate/tls.key

I was able to prove that Patroni is ignoring the ssl_key_file param by editing the /var/lib/postgresql/data/postgresql.conf config file on the running master container and inserting

ssl_ca_file: '/etc/certificate/ca.crt' 

VOLIA

tsh db connect d1-eng-db-master
psql (15.5 (Ubuntu 15.5-0ubuntu0.23.10.1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_128_GCM_SHA256, compression: off)
Type "help" for help.

postgres=# 

It worked ..

So I will need a more permanent solution

Leaving this here in case anyone else stumbles across this issue

MarkCupitt commented 5 months ago

Finally tracked this down to the docker image build:

https://github.com/timescale/timescaledb-docker-ha/issues/435