Closed MarkCupitt closed 8 months ago
I'm thinking that adding one line may solve this ...
timescaledb-docker-ha/scripts/configure_spilo.py
ssl: 'on'
ssl_cert_file: {{SSL_CERTIFICATE_FILE}}
ssl_key_file: {{SSL_PRIVATE_KEY_FILE}}
ssl_ca_file: {{{postgresql.parameters.ssl_ca_file}}}
shared_preload_libraries: 'bg_mon,pg_stat_statements,pgextwlist,pg_auth_mon'
There may need to be some logic somewhere to exclude it if its not supplied, not sure what postgreSql will do if its set to empty
So, I built a custom image after making the above changes using:
PG_MAJOR="15" PG_VERSIONS="15.5" TIMESCALEDB_VERSIONS="2.13.0" make build
Tagged it and uploaded it to my Dockerhub, docker pull markcupitt/timescaledb:v1.0.0
, loaded it to our Kubernetes cluster and can confirm I am now able to connect via SSL as originally expected.
I will next try it by removing the ca file from the pod, to test what PostgreSQL does when no ca is specified, but this ends up in the postgres parameters
ssl_ca_file:
I did this test, and can confirm that PostgreSQL threw a fit :) with the following log entry
2024-01-10 04:11:52 UTC [32]: [659e1908.20-1] @,app= [F0000] FATAL: could not load root certificate file "/etc/certificate/ca.crt": SSL error code 2147483650
So some logic in there somewhere is obviously required to ensure that parameter is populated with the correct ca or not included
Possibly in patroni/features/environment.py
around line 218, remove the entry for ssl_ca_cert
if empty or the file is non existent
Some additional issues with the Helm chart can be found at:
https://github.com/timescale/helm-charts/issues/641 https://github.com/timescale/helm-charts/issues/642 https://github.com/timescale/helm-charts/issues/639
The configure_spilo.py is deprecated long ago. Patroni is configured from the helm chart:
That configuration is stored in its own configmap https://github.com/timescale/helm-charts/blob/main/charts/timescaledb-single/templates/configmap-patroni.yaml
and is mounted into the Patroni container https://github.com/timescale/helm-charts/blob/7ded6b654c956a3f6dc119d90b47a0262eba600e/charts/timescaledb-single/templates/statefulset-timescaledb.yaml#L291
using the volume defined from that configmap: https://github.com/timescale/helm-charts/blob/7ded6b654c956a3f6dc119d90b47a0262eba600e/charts/timescaledb-single/templates/statefulset-timescaledb.yaml#L480
@MarkCupitt one may need to add support for the ssl_ca_file
into the helm chart, but configure_spilo.py
is a wrong place to add it. It is still there for compatibility reasons with the zalando postgres-operator
, although that compatibility is going to be removed at some point in the future, since the original spilo image will probably drift away from this one, and Patroni supports new parameters that the configure_spilo.py
here is not aware of.
I think if one wants to retain this compatibility, the right actions would be moving conifgure_spilo.py
and augment_patroni_configuraiton.py
into the scripts configmap in the helm chart, decoupling it from this docker image.
@alexeyklyukin If we include the ssl_ca_file in the helm chart, which we can in the postgres section, it IS included in the configmap, however it gets stripped out and does not end up in the postgres config in the container, I have verified that.
If I add the ssl_ca_file into a running container postgres parameters and reload the config, it works as expected.
SO whatever script runs at container init is the point of failure, I presumed it was the configure_spilo.py, in the container as part of the patroni deployment process but based on what you say, there must be some other process that runs????
Do you know what that is, so I can look at a fix, I have only been able to find the configure_spilo script in the container that seems to access that parameter... I did a global repo search
I don't think this is a helm issue as the chart does what it is supposed to and ssl_ca_file does end up in the config map, and is mounted, it breaks down after that point
@MarkCupitt the container init is at https://github.com/timescale/timescaledb-docker-ha/blob/26d75a1e50e543123c6a692e6551ddf95bed870f/timescaledb_entrypoint.sh#L1-L50
unless you specify the SPILO_CONFIGURATION
environment variable, which the helm chart doesn't,
you'll get straight to Patroni configuration defined in the values.yaml
of the helm chart.
In the helm chart values, you have the the certificateSecretName
in the helm chart, allowing your to override the secret for mounting certificates into the container, and patroni.bootstrap.dcs.postgresql.parameters
where you can reference ssl_ca_file
, pointing it to the content you've put into the certificateSecretName
secret.
The default behavior, if you don't specify the custom secret in the certificateSecretName
, is to write a self-signed one at https://github.com/timescale/helm-charts/blob/7ded6b654c956a3f6dc119d90b47a0262eba600e/charts/timescaledb-single/templates/secret-certificate.yaml
@alexeyklyukin in a Kubernetes environment adding a Ca in certificateSecretName breaks Patroni. This certificate is used to talk to the Kubernetes API and of course. on a custom cert, kubernetes does not know about it. Patroni DCS uses teh Kubernetes API for its object store. Its the first thing we tried.
The self signed cert does happen, but that cert cannot be used with Teleport per our use case we originally stated
ssl_ca_file in postgres parameters is REMOVEd by patroni, I gave you the code where that happened
This is going in circles .. we will have to find alternatives I guess .. an adjust manage our own builds ..
Bottom line is that the container needs to be fixed, we proved it and have it running in our own build, I gave the changes we made to get it working .. the only consideration is if a ca file is not specified, then some logic is required to remove the ssl_ca_file form the config or postgres will break, I suspect that people believe that certificateSecretName is the way to do it, its not, it breaks Patroni in a Kubernetes environment as I stated above
@alexeyklyukin our decision at this stage is that until timescaledb can fix the containers so we can specify a ca, we have no choice but to maintain our own builds (Which we are using now) ... We will drop the helm deployment and use a manifest based deployment using KLUCTL as the deployment tool. We will make our code and configs available in our GIthub Repos for anyone else who needs it, once we have it looking respectable
We will happily go back to official builds once the container can be given a Ca in the postgres parms section of the patroni config and it's not removed, unfortunately, we need to progress our project, our timelines are slipping significantly now
@alexeyklyukin if you feel that there is nothing wrong with the container or the patroni script , then please close this ticket, The container is unusable for us the way it is right now ..
However I really hope you guys will look at this issue, replicate it, and fix it so we can go back to the official builds OR tell us what we need to do to supply our own ca if its is indeed NOT the patroni script in the container that is the root cause of this issue
Patroni supports the inclusion of
ssl_ca_file
so that PostgreSQL can use custom tls certs with root cert validation (we require it for teleport db access, our database is airgapped, so we proxy to it via teleport)patroni/features/environment.py
supportsssl_ca_file
patroni/patroni/postgresql/validator.py
supportsssl_ca_file
helm-charts/charts/timescaledb-single/values.yaml
supportsssl_ca_file
However
timescaledb-docker-ha/scripts/configure_spilo.py
omitsssl_ca_file
which means that postgres never gets the ca and tls fails for any session that requires tls validation
This makes TimescaleDb unusable in many scenarios