zalando / postgres-operator

Postgres operator creates and manages PostgreSQL clusters running in Kubernetes
https://postgres-operator.readthedocs.io/
MIT License
4.37k stars 980 forks source link

How to Configure Backup for S3-Compliant Services Using Helm #2759

Open dotori1995 opened 2 months ago

dotori1995 commented 2 months ago

Hello,

Thank you very much for all your hard work.

I apologize in advance if this is a beginner-level question. I have installed 'Zalando' and 'Zalando UI' using Helm and ArgoCD. After creating the cluster and integrating it with Keycloak, I have confirmed that the database is working correctly. However, I have encountered an issue.

When backing up the database, I am using S3 Compatible Storage instead of AWS S3. I am unsure which property in the 'Zalando' Helm chart needs to be configured to connect with S3 Compatible Storage. I am also unsure whether a separate ConfigMap needs to be set up for this.

If it’s not too much trouble, could you kindly provide an example of the Helm configuration for this setup?

I will also share the changes I have made to the Helm configuration from the Git repository. (https://github.com/zalando/postgres-operator/blob/master/charts/postgres-operator/values.yaml)


configAwsOrGcp: aws_region: enable_ebs_gp3_migration: false log_s3_bucket: "" wal_s3_bucket: ""

configLogicalBackup: logical_backup_docker_image: "ghcr.io/zalando/postgres-operator/logical-backup:v1.13.0" logical_backup_job_prefix: "logical-backup-" logical_backup_provider: "s3" logical_backup_s3_access_key_id: "" logical_backup_s3_bucket: "" logical_backup_s3_bucket_prefix: "spilo" logical_backup_s3_region: "" logical_backup_s3_endpoint: "" logical_backup_s3_secret_access_key: "" logical_backup_s3_sse: "AES256" logical_backup_s3_retention_time: "" logical_backup_schedule: "30 00 *" logical_backup_cronjob_environment_secret: ""


The following is the log of the Postgres installed according to the above configuration.


root@postgre-db-0:/home/postgres/pgdata/pgroot/pg_log# cat postgresql-4.log 2024-09-12 07:55:21 UTC [67]: [5-1] 66e29e69.43 0 LOG: ending log output to stderr 2024-09-12 07:55:21 UTC [67]: [6-1] 66e29e69.43 0 HINT: Future log output will go to log destination "csvlog". 2024-09-12 08:41:35 UTC [457]: [5-1] 66e2a93f.1c9 0 LOG: ending log output to stderr 2024-09-12 08:41:35 UTC [457]: [6-1] 66e2a93f.1c9 0 HINT: Future log output will go to log destination "csvlog". 2024-09-12 11:53:53 UTC [64]: [5-1] 66e2d651.40 0 LOG: ending log output to stderr 2024-09-12 11:53:53 UTC [64]: [6-1] 66e2d651.40 0 HINT: Future log output will go to log destination "csvlog". INFO: 2024/09/12 11:53:54.331337 Files will be read from storages: [default] ERROR: 2024/09/12 11:53:54.564233 check file for existence in "default": failed to check s3 object 'spilo/postgre-db/01e92c1b-ac34-47b7-9e8f-ff59b2d15c45/wal/16/wal_005/00000004.history.lz4' existence: NoCredentialProviders: no valid providers in chain. Deprecated. For verbose messaging see aws.Config.CredentialsChainVerboseErrors INFO: 2024/09/12 11:53:54.699498 Files will be read from storages: [default] ERROR: 2024/09/12 11:53:54.876852 check file for existence in "default": failed to check s3 object 'spilo/postgre-db/01e92c1b-ac34-47b7-9e8f-ff59b2d15c45/wal/16/wal_005/00000003.history.lz4' existence: NoCredentialProviders: no valid providers in chain. Deprecated. For verbose messaging see aws.Config.CredentialsChainVerboseErrors INFO: 2024/09/12 11:53:54.982675 Files will be read from storages: [default] ERROR: 2024/09/12 11:53:55.110616 check file for existence in "default": failed to check s3 object 'spilo/postgre-db/01e92c1b-ac34-47b7-9e8f-ff59b2d15c45/wal/16/wal_005/000000030000000000000005.lz4' existence: NoCredentialProviders: no valid providers in chain. Deprecated. For verbose messaging see aws.Config.CredentialsChainVerboseErrors INFO: 2024/09/12 11:53:55.284958 Files will be read from storages: [default] ERROR: 2024/09/12 11:53:55.575299 check file for existence in "default": failed to check s3 object 'spilo/postgre-db/01e92c1b-ac34-47b7-9e8f-ff59b2d15c45/wal/16/wal_005/000000030000000000000006.lz4' existence: NoCredentialProviders: no valid providers in chain. Deprecated. For verbose messaging see aws.Config.CredentialsChainVerboseErrors INFO: 2024/09/12 11:53:55.783974 Files will be read from storages: [default] ERROR: 2024/09/12 11:53:55.973747 check file for existence in "default": failed to check s3 object 'spilo/postgre-db/01e92c1b-ac34-47b7-9e8f-ff59b2d15c45/wal/16/wal_005/000000030000000000000006.lz4' existence: NoCredentialProviders: no valid providers in chain. Deprecated. For verbose messaging see aws.Config.CredentialsChainVerboseErrors INFO: 2024/09/12 11:54:24.584261 Files will be read from storages: [default] ERROR: 2024/09/12 11:54:24.730085 check file for existence in "default": failed to check s3 object 'spilo/postgre-db/01e92c1b-ac34-47b7-9e8f-ff59b2d15c45/wal/16/wal_005/00000004.history.lz4' existence: NoCredentialProviders: no valid providers in chain. Deprecated. For verbose messaging see aws.Config.CredentialsChainVerboseErrors INFO: 2024/09/12 11:54:24.810595 Files will be read from storages: [default] ERROR: 2024/09/12 11:54:24.991250 check file for existence in "default": failed to check s3 object 'spilo/postgre-db/01e92c1b-ac34-47b7-9e8f-ff59b2d15c45/wal/16/wal_005/000000030000000000000007.lz4' existence: NoCredentialProviders: no valid providers in chain. Deprecated. For verbose messaging see aws.Config.CredentialsChainVerboseErrors INFO: 2024/09/12 11:54:25.190706 Files will be read from storages: [default] ERROR: 2024/09/12 11:54:25.377577 check file for existence in "default": failed to check s3 object 'spilo/postgre-db/01e92c1b-ac34-47b7-9e8f-ff59b2d15c45/wal/16/wal_005/000000030000000000000007.lz4' existence: NoCredentialProviders: no valid providers in chain. Deprecated. For verbose messaging see aws.Config.CredentialsChainVerboseErrors INFO: 2024/09/12 11:54:25.587005 Files will be read from storages: [default] ERROR: 2024/09/12 11:54:25.740130 check file for existence in "default": failed to check s3 object 'spilo/postgre-db/01e92c1b-ac34-47b7-9e8f-ff59b2d15c45/wal/16/wal_005/00000004.history.lz4' existence: NoCredentialProviders: no valid providers in chain. Deprecated. For verbose messaging see aws.Config.CredentialsChainVerboseErrors INFO: 2024/09/12 11:54:25.923450 Files will be read from storages: [default] ERROR: 2024/09/12 11:54:26.098262 check file for existence in "default": failed to check s3 object 'spilo/postgre-db/01e92c1b-ac34-47b7-9e8f-ff59b2d15c45/wal/16/wal_005/00000003.history.lz4' existence: NoCredentialProviders: no valid providers in chain. Deprecated. For verbose messaging see aws.Config.CredentialsChainVerboseErrors boto ERROR Unable to read instance data, giving up wal_e.main ERROR MSG: Could not retrieve secret key from instance profile. HINT: Check that your instance has an IAM profile or set --aws-access-key-id root@postgre-db-0:/home/postgres/pgdata/pgroot/pg_log#


dotori1995 commented 2 months ago

this is argocd yaml

apiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: zalando namespace: argocd finalizers:

yoshi314 commented 2 months ago

same here, i have no idea where to put credentials for s3 for wal/db backup.

yoshi314 commented 2 months ago

i figured it out based on a few articles, if it helps

you have to reference configmap or secret in operator's values.yaml

configKubernetes:
  pod_environment_configmap: "postgres-operator/pod-config"

and pod-config is a configmap/secret like so

apiVersion: v1
kind: ConfigMap
metadata:
  name: pod-config
data:
  WAL_S3_BUCKET: postgresql  # this bucket must exist, or it will fail in strange ways.
  WAL_BUCKET_SCOPE_PREFIX: "mybackups" # not sure if necessary, tbh
  WAL_BUCKET_SCOPE_SUFFIX: ""
#  USE_WALG_BACKUP: "true"
#  USE_WALG_RESTORE: "true"
  BACKUP_SCHEDULE: '00 10 * * *'
  # here are your s3 credentials
  AWS_ACCESS_KEY_ID: my_s3_account
  AWS_SECRET_ACCESS_KEY: sekritkey
  AWS_S3_FORCE_PATH_STYLE: "true" # allegedly necessary if using minIO
  AWS_ENDPOINT: https://my_s3.server.local
#  WALG_DISABLE_S3_SSE: "true"  # encryption of backups
  BACKUP_NUM_TO_RETAIN: "5"
#  CLONE_USE_WALG_RESTORE: "true"

i decided to go with wal-e here, commented out entries for older wal-g

dotori1995 commented 2 months ago

Thank you so much. I'll give it a try next Monday.

dotori1995 commented 2 months ago

i made it like this. Once again, I would like to express my gratitude.

{{- if .Values.configKubernetes.pod_environment_configmap }}
apiVersion: v1
kind: ConfigMap
metadata:
  name: pod-config
  namespace: {{ .Release.Namespace }}
data:
  WAL_S3_BUCKET: {{ .Values.configLogicalBackup.logical_backup_s3_bucket | quote }}  # this bucket must exist, or it will fail in strange ways.
  WAL_BUCKET_SCOPE_PREFIX: {{ .Values.configLogicalBackup.logical_backup_s3_bucket_prefix | quote }} # not sure if necessary, tbh
  WAL_BUCKET_SCOPE_SUFFIX: ""
#  USE_WALG_BACKUP: "true"
#  USE_WALG_RESTORE: "true"
  BACKUP_SCHEDULE: '00 10 * * *'
  # here are your s3 credentials
  AWS_ACCESS_KEY_ID: {{ .Values.configLogicalBackup.logical_backup_s3_access_key_id | quote }}
  AWS_SECRET_ACCESS_KEY: {{ .Values.configLogicalBackup.logical_backup_s3_secret_access_key | quote }}
  AWS_S3_FORCE_PATH_STYLE: "true" # allegedly necessary if using minIO
  AWS_ENDPOINT: {{ .Values.configLogicalBackup.logical_backup_s3_endpoint | quote }}
#  WALG_DISABLE_S3_SSE: "true"  # encryption of backups
  BACKUP_NUM_TO_RETAIN: "5"
#  CLONE_USE_WALG_RESTORE: "true"
{{- end }}
yoshi314 commented 1 month ago

i am still digging through the docs to see if i can make every cluster have separate backup settings. i do not want to set it up on operator level, as i can have many different pg clusters in many namespaces of one k8s cluster.

Lebvanih commented 1 month ago

We are using the environment variables per cluster in our installation, and that works pretty well in 1.11.0 (didn't check yet on newer versions). Also, I'd recommend using secrets instead of configmap for what you did above.

Here is what I have in our chart (relevant part) for the cluster manifest file (include the variable we use if we do a restore):

{{- if or .Values.cluster.backup.pitr.enabled .Values.cluster.backup.pitr.restore}}
  env:
{{- if .Values.cluster.backup.pitr.enabled}}
    - name: WAL_S3_BUCKET
      valueFrom:
        secretKeyRef:
          name: {{ template "postgres.fullname" . }}-pitr-s3
          key: bucket
    - name: AWS_ACCESS_KEY_ID
      valueFrom:
        secretKeyRef:
          name: {{ template "postgres.fullname" . }}-pitr-s3
          key: access    
    - name: AWS_ENDPOINT
      valueFrom:
        secretKeyRef:
          name: {{ template "postgres.fullname" . }}-pitr-s3
          key: endpoint  
    - name: AWS_REGION
      valueFrom:
        secretKeyRef:
          name: {{ template "postgres.fullname" . }}-pitr-s3
          key: region  
    - name: AWS_SECRET_ACCESS_KEY
      valueFrom:
        secretKeyRef:
          name: {{ template "postgres.fullname" . }}-pitr-s3
          key: secret  
    - name: WALG_LIBSODIUM_KEY
      valueFrom:
        secretKeyRef:
          name: {{ template "postgres.fullname" . }}-sodium-key
          key: key  
    - name: USE_WALG_BACKUP
      value: "true"
    - name: AWS_S3_FORCE_PATH_STYLE
      value: "true"
    - name: BACKUP_NUM_TO_RETAIN
      value: {{ .Values.cluster.backup.pitr.retention | quote}}
    - name: WAL_BUCKET_SCOPE_PREFIX
      value: {{ .Release.Namespace }}/
{{- end }}
{{- if .Values.cluster.backup.pitr.restore}}
    - name: CLONE_WAL_S3_BUCKET
      valueFrom:
        secretKeyRef:
          name: {{ template "postgres.fullname" . }}-recovery
          key: bucket
    - name: CLONE_AWS_ACCESS_KEY_ID
      valueFrom:
        secretKeyRef:
          name: {{ template "postgres.fullname" . }}-recovery
          key: access    
    - name: CLONE_AWS_ENDPOINT
      valueFrom:
        secretKeyRef:
          name: {{ template "postgres.fullname" . }}-recovery
          key: endpoint  
    - name: CLONE_AWS_REGION
      valueFrom:
        secretKeyRef:
          name: {{ template "postgres.fullname" . }}-recovery
          key: region  
    - name: CLONE_AWS_SECRET_ACCESS_KEY
      valueFrom:
        secretKeyRef:
          name: {{ template "postgres.fullname" . }}-recovery
          key: secret  
    - name: CLONE_WALG_LIBSODIUM_KEY
      valueFrom:
        secretKeyRef:
          name: {{ template "postgres.fullname" . }}-recovery
          key: sodiumkey  
    - name: CLONE_USE_WALG_BACKUP
      value: "true"
    - name: CLONE_AWS_S3_FORCE_PATH_STYLE
      value: "true"
    - name: CLONE_WALG_DISABLE_S3_SSE
      value: "true"
    - name: CLONE_METHOD
      value: "CLONE_WITH_WALE"
    - name: CLONE_WALG_S3_PREFIX
      valueFrom:
        secretKeyRef:
          name: {{ template "postgres.fullname" . }}-recovery
          key: walgS3Prefix
    - name: CLONE_TARGET_TIME
      valueFrom:
        secretKeyRef:
          name: {{ template "postgres.fullname" . }}-recovery
          key: targetTime
{{- end }}
{{- end }}

Small edit: This is only for PITR, I didn't check logicalBackup.

yoshi314 commented 1 month ago

so from what i have read there are few options for per-pg-cluster backup settings.

  1. you can use the env: section in your cluster definition to provide all variables.

  2. you can use pod_environment_secret to reference a secret that will be expected to be there with your postgresql cluster ( in the same namespace )

This comes under assumption that each pg cluster has separate namespace, and that every such namespace has a secret of name given in that parameter and that there is no conflicting configmap that provides the same environment values , referenced from pod_environment_configmap

This way every pg cluster has separate env variables.