Ignoring some fields' differences to avoid uneeded pods switchover

Please, answer some short questions which should help us to understand your problem / question better?

Which image of the operator are you using? ghcr.io/zalando/spilo-15:3.0-p1
Where do you run it - cloud or metal? Kubernetes or OpenShift? Bare Metal K8s
Are you running Postgres Operator in production? Not yet
Type of issue? Bug report, question

Every 10/30 minutes my DB pods are restarted which make the DB unavailable for a couple seconds. I think I've traced the issue to the way the operator compares the pod in the cluster and what it expects to find. I have a software running in my cluster that patches pods images to redirect it to a caching proxy. (say docker.io/library/toto -> mycachingproxy/docker.io/library/toto) which I think causes this.

So my question is : is my analysis correct ? I'd be glad to post a MR to fix this issue if that's the case.

Thanks by advance.

Logs from the operator :

postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:00:31Z" level=info msg="cluster has been synced" cluster-name=systems-services-harbor/harbor-db pkg=controller worker=0
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:29:53Z" level=info msg="SYNC event has been queued" cluster-name=systems-services-harbor/harbor-db pkg=controller worker=0
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:29:53Z" level=info msg="there are 1 clusters running" pkg=controller
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:29:53Z" level=info msg="syncing of the cluster started" cluster-name=systems-services-harbor/harbor-db pkg=controller worker=0
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:29:53Z" level=debug msg="team API is disabled" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:29:53Z" level=debug msg="team API is disabled" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:29:53Z" level=info msg="syncing secrets" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:29:53Z" level=debug msg="syncing master service" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:29:53Z" level=debug msg="final load balancer source ranges as seen in a service spec (not necessarily applied): [\"127.0.0.1/32\"]" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:29:53Z" level=debug msg="syncing replica service" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:29:54Z" level=debug msg="syncing volumes using \"pvc\" storage resize mode" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:29:54Z" level=info msg="volume claims do not require changes" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:29:54Z" level=debug msg="syncing statefulsets" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:29:54Z" level=debug msg="mark rolling update annotation for harbor-db-0: reason pod not yet restarted due to lazy update" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:29:54Z" level=debug msg="mark rolling update annotation for harbor-db-1: reason pod not yet restarted due to lazy update" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:29:55Z" level=debug msg="syncing Patroni config" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:29:55Z" level=debug msg="making GET http request: http://10.233.69.103:8008/config" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:29:55Z" level=debug msg="making GET http request: http://10.233.67.21:8008/config" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:29:55Z" level=debug msg="making GET http request: http://10.233.69.103:8008/patroni" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:29:55Z" level=debug msg="making GET http request: http://10.233.67.21:8008/patroni" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:29:55Z" level=debug msg="performing rolling update" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:29:55Z" level=info msg="there are 2 pods in the cluster to recreate" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:29:55Z" level=debug msg="subscribing to pod \"systems-services-harbor/harbor-db-0\"" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:30:11Z" level=info msg="pod \"systems-services-harbor/harbor-db-0\" has been recreated" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:30:11Z" level=debug msg="unsubscribing from pod \"systems-services-harbor/harbor-db-0\" events" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:30:11Z" level=debug msg="making GET http request: http://10.233.67.21:8008/cluster" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:30:11Z" level=debug msg="switching over from \"harbor-db-1\" to \"systems-services-harbor/harbor-db-0\"" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:30:11Z" level=debug msg="subscribing to pod \"systems-services-harbor/harbor-db-0\"" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:30:11Z" level=debug msg="making POST http request: http://10.233.67.21:8008/failover" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:30:13Z" level=debug msg="successfully switched over from \"harbor-db-1\" to \"systems-services-harbor/harbor-db-0\"" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:30:14Z" level=debug msg="unsubscribing from pod \"systems-services-harbor/harbor-db-0\" events" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:30:14Z" level=info msg="recreating old master pod \"systems-services-harbor/harbor-db-1\"" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:30:14Z" level=debug msg="subscribing to pod \"systems-services-harbor/harbor-db-1\"" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:30:30Z" level=info msg="pod \"systems-services-harbor/harbor-db-1\" has been recreated" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:30:30Z" level=debug msg="unsubscribing from pod \"systems-services-harbor/harbor-db-1\" events" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:30:30Z" level=debug msg="syncing pod disruption budgets" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:30:30Z" level=debug msg="syncing roles" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:30:30Z" level=debug msg="closing database connection" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:30:30Z" level=debug msg="syncing databases" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:30:30Z" level=debug msg="closing database connection" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:30:30Z" level=debug msg="syncing prepared databases with schemas" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:30:30Z" level=debug msg="syncing connection pooler (master, replica) from (true, nil) to (true, nil)" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:30:30Z" level=debug msg="final load balancer source ranges as seen in a service spec (not necessarily applied): [\"127.0.0.1/32\"]" cluster-name=systems-services-harbor/harbor-db pkg=cluster
postgres-operator-56fb8b4bdb-zp7nw postgres-operator time="2023-10-03T12:30:30Z" level=info msg="cluster has been synced" cluster-name=systems-services-harbor/harbor-db pkg=controller worker=0

Relevant master logs :

harbor-db-0 postgres 2023-10-03 12:59:40.591 UTC [25] LOG {ticks: 0, maint: 0, retry: 0}
harbor-db-0 postgres 2023-10-03 12:59:44,406 INFO: no action. I am (harbor-db-0), the leader with the lock
harbor-db-0 postgres 2023-10-03 12:59:54,406 INFO: no action. I am (harbor-db-0), the leader with the lock
harbor-db-0 postgres 2023-10-03 13:00:04,465 INFO: no action. I am (harbor-db-0), the leader with the lock
harbor-db-0 postgres 2023-10-03 13:00:10.593 UTC [25] LOG {ticks: 0, maint: 0, retry: 0}
harbor-db-0 postgres 2023-10-03 13:00:12,521 INFO: received failover request with leader=harbor-db-0 candidate=harbor-db-1 scheduled_at=None
harbor-db-0 postgres 2023-10-03 13:00:12,531 INFO: Got response from harbor-db-1 http://10.233.67.152:8008/patroni: {"state": "running", "postmaster_start_time": "2023-10-03 13:00:11.433821+00:00", "role": "replica", "server_version": 130010, "xlog": {"received_location": 20401094656, "replayed_location": 20401094656, "replayed_timestamp": "2023-10-03 13:00:03.787290+00:00", "paused": false}, "timeline": 990, "dcs_last_seen": 1696338012, "database_system_identifier": "7276438021957627967", "patroni": {"version": "3.0.1", "scope": "harbor-db"}}
harbor-db-0 postgres 2023-10-03 13:00:12,630 INFO: Got response from harbor-db-1 http://10.233.67.152:8008/patroni: {"state": "running", "postmaster_start_time": "2023-10-03 13:00:11.433821+00:00", "role": "replica", "server_version": 130010, "xlog": {"received_location": 20401094656, "replayed_location": 20401094656, "replayed_timestamp": "2023-10-03 13:00:03.787290+00:00", "paused": false}, "timeline": 990, "dcs_last_seen": 1696338012, "database_system_identifier": "7276438021957627967", "patroni": {"version": "3.0.1", "scope": "harbor-db"}}
harbor-db-0 postgres 2023-10-03 13:00:12,578 INFO: Lock owner: harbor-db-0; I am harbor-db-0
harbor-db-0 postgres 2023-10-03 13:00:12,684 INFO: manual failover: demoting myself
harbor-db-0 postgres 2023-10-03 13:00:12,685 INFO: Demoting self (graceful)
harbor-db-0 postgres 2023-10-03 13:00:13,948 INFO: Leader key released
harbor-db-0 postgres 2023-10-03 13:00:13,952 INFO: Lock owner: harbor-db-1; I am harbor-db-0
harbor-db-0 postgres 2023-10-03 13:00:13,952 INFO: manual failover: demote in progress
harbor-db-0 postgres 2023-10-03 13:00:13,953 INFO: Lock owner: harbor-db-1; I am harbor-db-0
harbor-db-0 postgres 2023-10-03 13:00:13,953 INFO: manual failover: demote in progress
harbor-db-0 postgres 2023-10-03 13:00:14,096 INFO: Lock owner: harbor-db-1; I am harbor-db-0
harbor-db-0 postgres 2023-10-03 13:00:14,096 INFO: manual failover: demote in progress
harbor-db-0 postgres 2023-10-03 13:00:15,105 INFO: Lock owner: harbor-db-1; I am harbor-db-0
harbor-db-0 postgres 2023-10-03 13:00:15,105 INFO: manual failover: demote in progress
harbor-db-0 postgres 2023-10-03 13:00:16 UTC [740]: [1-1] 651c1060.2e4 0     LOG:  Auto detecting pg_stat_kcache.linux_hz parameter...
harbor-db-0 postgres 2023-10-03 13:00:16 UTC [740]: [2-1] 651c1060.2e4 0     LOG:  pg_stat_kcache.linux_hz is set to 1000000
harbor-db-0 postgres 2023-10-03 13:00:16 UTC [740]: [3-1] 651c1060.2e4 0     LOG:  redirecting log output to logging collector process
harbor-db-0 postgres 2023-10-03 13:00:16 UTC [740]: [4-1] 651c1060.2e4 0     HINT:  Future log output will appear in directory "../pg_log".
harbor-db-0 postgres /var/run/postgresql:5432 - no response
harbor-db-0 postgres /var/run/postgresql:5432 - accepting connections
harbor-db-0 postgres /var/run/postgresql:5432 - accepting connections
harbor-db-0 postgres /etc/runit/runsvdir/default/patroni: finished with code=0 signal=0
harbor-db-0 postgres stopping /etc/runit/runsvdir/default/patroni
harbor-db-0 postgres timeout: finish: .: (pid 756) 1815s, want down
harbor-db-0 postgres ok: down: patroni: 0s, normally up
harbor-db-0 postgres ok: down: /etc/service/patroni: 0s, normally up
harbor-db-0 postgres 2023-10-03 13:00:24.881 UTC [25] LOG Got SIGTERM, fast exit
harbor-db-0 postgres ok: down: /etc/service/pgqd: 1s, normally up

zalando / postgres-operator

Ignoring some fields' differences to avoid uneeded pods switchover #2436