zalando / postgres-operator

Postgres operator creates and manages PostgreSQL clusters running in Kubernetes
https://postgres-operator.readthedocs.io/
MIT License
4.37k stars 981 forks source link

Trying to add backups to existing cluster, `/run/etc/wal-e.d/env` is missing #2618

Open verdverm opened 6 months ago

verdverm commented 6 months ago

Please, answer some short questions which should help us to understand your problem / question better?

I'm trying to add backups to an existing cluster, to push WAL-E to GCS.

I first updated the Helm Chart as follows (after doing the GCP IAM stuff)

helm upgrade --install \
  postgres \
  postgres-operator-charts/postgres-operator \
  --reuse-values \
  --set aws_or_gcp.wal_gs_bucket=<bucket-name> \
  --set podServiceAccount.name=postgres-pod-custom \
  --namespace=operators \
  --wait

Was hoping that setting aws_or_gcp.wal_gs_bucket would be sufficient

I've also tried adding a ConfigMap for the operator

Still, the /run/etc/... directory that should house the necessary wal-e.d stuff does not exist

root@psql-0:/home/postgres# ls -lh
total 8.0K
lrwxrwxrwx 1 root     root    8 Mar  6 13:19 etc -> /run/etc    <- DOES NOT EXIST
drwxr-xr-x 4 root     root 4.0K Mar 22 23:55 pgdata
-rw-rw-r-- 1 postgres root  156 Mar  6 13:08 pgq_ticker.ini
lrwxrwxrwx 1 root     root   17 Mar  6 13:19 postgres.yml -> /run/postgres.yml

Any help & guidance is much appreciated, thanks!

verdverm commented 6 months ago

Update, it seems this is the proper helm command

helm upgrade --install \          
  postgres \
  postgres-operator-charts/postgres-operator \
  --set configAwsOrGcp.wal_gs_bucket=<bucket name> \
  --set podServiceAccount.name=postgres-pod-custom \
  --namespace=operators \
  --wait

Note the format of --set configAwsOrGcp.wal_gs_bucket=<bucket name>

It seems this fills in the missing part, but now I am getting leader election issues. Does the instance fail to become healthy if it cannot connect to GCS via Workload Identity during startup?

verdverm commented 6 months ago

Ok, so it seems Workload Identity was the issue because the command changed

verdverm commented 6 months ago

Looks to be working for me now.

I would say the only change needed would be to update the docs to reflect the new command format