zalando / postgres-operator

Postgres operator creates and manages PostgreSQL clusters running in Kubernetes
https://postgres-operator.readthedocs.io/
MIT License
4.34k stars 979 forks source link

Ability to add SSL certificate for S3 #845

Closed aurelienmarie closed 6 months ago

aurelienmarie commented 4 years ago

I am currently testing the operator and running Minio for S3 storage. I managed to do the logical backup with TLS disabled on Minio and it works fine.

However, i have this issue when i enable the TLS on Minio.

+ dump
+ /usr/lib/postgresql/10/bin/pg_dumpall
+ compress
+ pigz
++ estimate_size
++ /usr/lib/postgresql/10/bin/psql -tqAc 'select sum(pg_database_size(datname)::numeric) from pg_database;'
+ aws_upload 6377860
+ declare -r EXPECTED_SIZE=6377860
++ date +%s
+ PATH_TO_BACKUP=s3://databases/spilo/test-test-cluster/575fa115-2b52-4972-b4e2-e59fe967f962/logical_backups/1582638557.sql.gz
+ args=()
+ [[ ! -z 6377860 ]]
+ args+=("--expected-size=$EXPECTED_SIZE")
+ [[ ! -z https://minio.default.svc.cluster.local:9000 ]]
+ args+=("--endpoint-url=$LOGICAL_BACKUP_S3_ENDPOINT")
+ [[ ! -z '' ]]
+ aws s3 cp - s3://databases/spilo/test-test-cluster/575fa115-2b52-4972-b4e2-e59fe967f962/logical_backups/1582638557.sql.gz --expected-size=6377860 --endpoint-url=https://minio.default.svc.cluster.local:9000
upload failed: - to s3://databases/spilo/test-test-cluster/575fa115-2b52-4972-b4e2-e59fe967f962/logical_backups/1582638557.sql.gz SSL validation failed for https://minio.default.svc.cluster.local:9000/databases/spilo/test-test-cluster/575fa115-2b52-4972-b4e2-e59fe967f962/logical_backups/1582638557.sql.gz [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)

I've build my own docker image for backups adding args+=("--no-verify-ssl") in the dump.sh. As a result, i was able to backup my db with TLS enable on Minio.

The idea here is to add a parameter on the operatorconfiguration in order to either be able to specify a certificate, or disable the ssl verification.

Zalando postgres-operator version : v1.3.1

FxKu commented 4 years ago

There is currently a PR open to add TLS support for database pods. At the moment, the TLS volume is not passed to the logical backup pod. Would this be useful for you?

aurelienmarie commented 4 years ago

@FxKu my issue is not related to the TLS support for database pods but for the the backups cronjobs. It's required for wal-e/wal-g in order to target the S3 endpoint.

Jonher937 commented 4 years ago

I also had this problem today. This happens when using a S3 server that does not have a valid signed certificate.

It would be nice if the operator could accept something like a base64 encoded PEM certificate, this would then be passed to the Logical Backup container as a environment variable where dump.sh could base64 -d the content and save that to a mktmp file or similar.

Appending --ca-bundle=$CA_FILE" to the aws cli (or setting the AWS_CA_BUNDLE environment variable) so verification is in place instead of using --insecure (or allowing for both options)

AWS_CA_BUNDLE Specifies the path to a certificate bundle to use for HTTPS certificate validation.

If defined, this environment variable overrides the value for the profile setting ca_bundle. You can override this environment variable by using the --ca-bundle command line parameter.

blyry commented 4 years ago

You can workaround this by using letsencrypt and putting a valid certificate on your minio deployment. You don't even need to make your endpoint public now that most dns providers support api based cname validation! That seems like a 'better' solution than supporting the --insecure on their own implementation.

Supporting custom TLS or root CAs on the operator side is also a good feature to add though, but then you still have to go through the process of signing, deploying and managing a cert for your minio installation.

DrissiReda commented 4 years ago

Is it possible to disable ssl altogether for minio logical backups? I can't find the option in documentation.

alexzimmer96 commented 3 years ago

Is there any update? We are also facing the issue, that backing up to a remote minio instance fails because of self signed certificates. We do not control the minio, so it it is not that easy to change the certificates. I have are some workarounds in my mind (like adding a cluster-internal reverse proxy), but they would be some kind of hacky.

thangamani-arun commented 2 years ago

@FxKu we are really looking --insecure or --tls-no-verify kind of flags for logical backup pods, since there are no way to add CA cert to the pod. Please consider as needed feature.

alexanderheckel commented 1 year ago

You can use additionalVolumes to add the CA certificate to the database pod and set WALG_S3_CA_CERT_FILE pointing to the mounted file.

  additionalVolumes:
    - name: minio-ca-certificate
      mountPath: /certs/minio
      targetContainers: []
      volumeSource:
        secret:
          secretName: minio-ca-certificate
  env:
    - name: WALG_S3_CA_CERT_FILE
      value: "/certs/minio/ca.crt"
DekelDevunet commented 1 year ago

+1

vponnathota commented 1 year ago

You can use additionalVolumes to add the CA certificate to the database pod and set WALG_S3_CA_CERT_FILE pointing to the mounted file.

  additionalVolumes:
    - name: minio-ca-certificate
      mountPath: /certs/minio
      targetContainers: []
      volumeSource:
        secret:
          secretName: minio-ca-certificate
  env:
    - name: WALG_S3_CA_CERT_FILE
      value: "/certs/minio/ca.crt"

I followed the same steps but still failed to upload backups to minio s3 , can i get some help here

k get pods -n zalando NAME READY STATUS RESTARTS AGE abc-time-0 1/1 Running 0 43h abc-time-1 1/1 Running 0 43h abc-time-2 1/1 Running 0 43h logical-backup-abc-time-27964140--1-8x5qm 0/1 Error 0 32m logical-backup-abc-time-27964140--1-hf4cf 0/1 Error 0 23m logical-backup-abc-time-27964140--1-ncfkg 0/1 Error 0 33m logical-backup-abc-time-27964140--1-nwfhw 0/1 Error 0 32m logical-backup-abc-time-27964140--1-t5rd7 0/1 Error 0 31m logical-backup-abc-time-27964140--1-vrc5j 0/1 Error 0 29m logical-backup-abc-time-27964140--1-xszth 0/1 Error 0 17m postgres-operator-7486f85b89-qndjv 1/1 Running 0 6d17h k logs logical-backup-abc-time-27964140--1-xszth -n zalando IPv4 API Endpoint: https://x.x.x.x:443/api/v1 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 14170 0 14170 0 0 1383k 0 --:--:-- --:--:-- --:--:-- 1383k % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 167 100 167 0 0 15181 0 --:--:-- --:--:-- --:--:-- 15181 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 167 100 167 0 0 20875 0 --:--:-- --:--:-- --:--:-- 20875 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 19762 0 19762 0 0 2412k 0 --:--:-- --:--:-- --:--:-- 2412k

thangamani-arun commented 1 year ago

but the operator keeps on recreating the pods due to difference in the pod spec(additional volume in postgres )

D1StrX commented 1 year ago

running into the exact same issue. Would be nice if a custom ca could be mounted

thangamani-arun commented 1 year ago

@FxKu : Can you guys fix this as a priority?

ErikLundJensen commented 1 year ago

If your CA used for the Postgress tls certificate is the same as your S3 backend's CA then you can reuse the ca.crt mounted into the postgres container. For example as environment variable:

env:
- name: WALG_S3_CA_CERT_FILE
  value: "/tls/ca.crt"

or add to the ConfigMap for the Postgres Operator if using the pod_environment_configmap in the Helm chart:

WALG_S3_CA_CERT_FILE: "/tls/ca.crt"
sreenandan commented 8 months ago

I am stuck on this topic for 2 days after following all the suggestions here, can you please help?

I brought up minio with: helm install minio oci://registry-1.docker.io/bitnamicharts/minio --set tls.enabled=true --set tls.autoGenerated=true

I created a postgresql cluster with the following:

apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
  name: test41
spec:
  env:
  - name: AWS_ACCESS_KEY_ID
    value: admin
  - name: AWS_ENDPOINT
    value: https://10.9.20.45:31734
  - name: AWS_REGION
    value: us-east-1
  - name: AWS_S3_FORCE_PATH_STYLE
    value: "true"
  - name: AWS_SECRET_ACCESS_KEY
    value: jK501Iv3tt
  - name: BACKUP_NUM_TO_RETAIN
    value: "5"
  - name: BACKUP_SCHEDULE
    value: 00 10 * * *
  - name: CLONE_USE_WALG_RESTORE
    value: "true"
  - name: USE_WALG_BACKUP
    value: "true"
  - name: USE_WALG_RESTORE
    value: "true"
  - name: WAL_BUCKET_SCOPE_PREFIX
    value: ""
  - name: WAL_BUCKET_SCOPE_SUFFIX
    value: ""
  - name: WAL_S3_BUCKET
    value: test40
  - name: WALG_DISABLE_S3_SSE
    value: "true"
  - name: WALG_S3_CA_CERT_FILE
    value: "/certs/minio/ca.crt"
  - name: WALE_LOG_DESTINATION
    value: "/tmp/walg.log"
  teamId: "test41"
  additionalVolumes:
    - name: minio-tls
      mountPath: /certs/minio
      targetContainers: []
      volumeSource:
        secret:
          secretName: minio-crt
  volume:
    size: 1Gi
  numberOfInstances: 1
  users:
    zalando:  # database owner
    - superuser
    - createdb
    foo_user: []  # role for application foo
  databases:
    foo: zalando  # dbname: owner
  preparedDatabases:
    bar: {}
  postgresql:
    version: "15"
    parameters:
      huge_pages: "off"

wal-g is getting stuck

root@test41-0:/run/etc/wal-e.d/env# echo $WALG_S3_CA_CERT_FILE
/certs/minio/ca.crt
root@test41-0:/run/etc/wal-e.d/env# envdir "/run/etc/wal-e.d/env" wal-g  backup-list

^C
root@test41-0:/run/etc/wal-e.d/env# ps -aux
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root          1  0.0  0.0   2636   500 ?        Ss   16:43   0:00 /usr/bin/dumb-init -c --rewrite 1:0 -- /bin/sh /launch.sh
root          6  0.0  0.0   2884   784 ?        S    16:43   0:00 /bin/sh /launch.sh
root         24  0.0  0.0   2804   508 ?        S    16:43   0:00 /usr/bin/runsvdir -P /etc/service
root         30  0.0  0.0   2652   516 ?        Ss   16:43   0:00 runsv cron
root         31  0.0  0.0   2652   512 ?        Ss   16:43   0:00 runsv patroni
root         32  0.0  0.0   2652   512 ?        Ss   16:43   0:00 runsv pgqd
postgres     33  0.0  0.0 490716 29504 ?        Sl   16:43   0:04 /usr/bin/python3 /usr/local/bin/patroni /home/postgres/postgres.yml
root         34  0.0  0.0   3868  1148 ?        S    16:43   0:00 /usr/sbin/cron -f
postgres     35  0.0  0.0  17040  4220 ?        S    16:43   0:00 /usr/bin/pgqd /home/postgres/pgq_ticker.ini
postgres    117  0.0  0.0 194176 24616 ?        S    16:43   0:00 /usr/lib/postgresql/15/bin/postgres -D /home/postgres/pgdata/pgroot/data --config-file=/home/postgres/pgdata/pgroot/data/po
postgres    119  0.0  0.0  75168  3672 ?        Ss   16:43   0:00 postgres: test41: logger 
postgres    120  0.0  0.0 194308 12684 ?        Ss   16:43   0:00 postgres: test41: checkpointer 
postgres    121  0.0  0.0 194316  4540 ?        Ss   16:43   0:00 postgres: test41: background writer 
postgres    123  0.2  0.0 296608 16568 ?        Ssl  16:43   0:21 postgres: test41: bg_mon 
postgres    126  0.0  0.0 194176  6912 ?        Ss   16:43   0:00 postgres: test41: walwriter 
postgres    127  0.0  0.0 195788  4772 ?        Ss   16:43   0:00 postgres: test41: autovacuum launcher 
postgres    128  0.0  0.0 194284  4016 ?        Ss   16:43   0:00 postgres: test41: archiver archiving 000000010000000000000001
postgres    129  0.0  0.0 196756  7396 ?        Ss   16:43   0:00 postgres: test41: pg_cron launcher 
postgres    130  0.0  0.0 195764  4520 ?        Ss   16:43   0:00 postgres: test41: logical replication launcher 
postgres    135  0.0  0.0 197500 12096 ?        Ss   16:43   0:00 postgres: test41: postgres postgres [local] idle
root        187  0.0  0.0   8156  2856 pts/0    Ss   16:43   0:00 bash
postgres   1269  0.0  0.0   2872   504 ?        S    18:31   0:00 sh -c envdir "/run/etc/wal-e.d/env" wal-g wal-push "pg_wal/000000010000000000000001"
postgres   1270  0.3  0.0 1637912 34912 ?       Sl   18:31   0:02 wal-g wal-push pg_wal/000000010000000000000001
root       1442  0.0  0.0  10456  1716 pts/0    R+   18:44   0:00 ps -aux

I dont see the same issue with minio tls disabled.

sreenandan commented 8 months ago

I figured this out, the problem is with which IPs allowed with ca.crt. Bitnami minio helm chart auto-gen certs allows connections only to the k8s service ip and I was using host ip with nodeport which wont work. I changed AWS_ENDPOINT from https://10.9.20.45:31734 to https://minio.default.svc.cluster.local:9000 and wal-g is not stuck anymore, things are flowing.

thangamani-arun commented 6 months ago

I think the issue is not resolved. The problem here is that the S3 SSL cert is a self-signed cert. Hence the backup job pods don't have S3' CA cert. Hence it will fail to run backup job.

adding a CA volume as removed by postgres-operator due to difference it spec. Hence that solution is also doesn't work.

But @FxKu has closed the issue now. any reason why?