zalando / postgres-operator

Postgres operator creates and manages PostgreSQL clusters running in Kubernetes
https://postgres-operator.readthedocs.io/
MIT License
4.38k stars 982 forks source link

Changing spilo image ignored when updating Postgres major version at the same time #1909

Open wiltomdus opened 2 years ago

wiltomdus commented 2 years ago

I'm trying to understand how to properly execute a major postgres update using the operator CRD. Currently the issue I'm facing is when updating the spilo image in the operator-default-configuration from PG13 to PG14 and updating the postgres version in the postgres cluster manifest, the operator will perform a rolling update on the postgres cluster but will not update the image to PG14 until a reboot of the operator is triggered.

Operator configuration for PG13:

apiVersion: "acid.zalan.do/v1"
kind: OperatorConfiguration
metadata:
  name: postgresql-operator-default-configuration
configuration:
  docker_image: modifiedSpiloImageOnPG13:2.1-p5-13
...
major_version_upgrade:
    major_version_upgrade_mode: "full"
    minimal_major_version: "13"
    target_major_version: "14"

Changed to: docker_image: modifiedSpiloImageOnPG14:2.1-p5-14

Postgres Cluster manifest:

  postgresql:
    version: "13"

To:

  postgresql:
    version: "14"

kubectl apply operator-config.yaml kubectl apply postgres-cluster.yaml

After these modifications, the operator will trigger a rolling update that will only change the postgresql version. It will not change the spilo image. But if I force the operator to restart, it will then notice the new image in it's configuration and trigger another rolling update that will update the spilo image correctly.

Is there a way to trigger the image rolling update without having to restart the operator?

FxKu commented 2 years ago

It depends on the order of edits. Assuming your config runs with a spilo-13 image and major_version_upgrade_mode: "full". If you would then first change the version in the manifest, the operator would will do a rolling update setting a new PGVERSION env variable but fail to upgrade to 14 because the spilo-13 image does not contain Pg 14.

But looks like you've updated the config first, so it should trigger a rolling update to replace the image to spilo-14. If you do not change the postgres version in manifest the cluster will continue running Pg 13. Spilo-14 just means it contains Pg 14. But that you probably know. I will test if you've hit an edge case by changing the spilo image and pg version at the same time, and that the image change is ignored.