Open bchrobot opened 3 years ago
Also cannot make rolling upgrade from 12 to 13 version, PGVERSION changes accordingly but actual version inside pod stays the same. Pods restarts beginning from one of replicas, than other replica, and ex-primary after them. Operator parameter major_version_upgrade_mode is "manual".
Also cannot make rolling upgrade from 12 to 13 version, PGVERSION changes accordingly but actual version inside pod stays the same. Pods restarts beginning from one of replicas, than other replica, and ex-primary after them. Operator parameter major_version_upgrade_mode is "manual".
Have the exact behaviour too on AKS.
Same issue with postgres-operator 1.7.1 that tries to upgrade postgresql cluster from v13 to v14. Every sync-period (10m in my case) postgres-operator initiates major version upgrade on the secondary (out of two pods: 1 primary + 2 secondary) and consequently fails.
The content of the last-upgrade.log is (I removed time to shorten):
inplace_upgrade INFO: No PostgreSQL configuration items changed, nothing to reload.
inplace_upgrade WARNING: Kubernetes RBAC doesn't allow GET access to the 'kubernetes' endpoint in the 'default' namespace. Disabling 'bypass_api_service'.
inplace_upgrade INFO: establishing a new patroni connection to the postgres cluster
inplace_upgrade ERROR: PostgreSQL is not running or in recovery
After that postgres-operator creates an Event Upgrade from 130004 to 140000 finished and after 10 minutes starts from the beginning.
After I run python3 /scripts/inplace_upgrade.py 2
manually at master pod, upgrade have finished successfully.
Please, answer some short questions which should help us to understand your problem / question better?
registry.opensource.zalan.do/acid/postgres-operator:v1.6.2
Relevant operator config and logs are here: https://gist.github.com/bchrobot/78be1494857fb98f602557a7e0dc15d7
We have a number of production clusters running PG 12 and recently began updating them to PG 13 using the new major version upgrade feature in
postgres-operator
. Some of the upgrades have gone smoothly but others have gotten stuck. Attempts to then run theinplace_upgrade.py
script manually as described here have resulted in a broken replica-replica state.The issue seems to be that
postgres-operator
updates the pods with the new PG 13 envvar but then runsinplace_upgrade.py
on the replica rather than the master. This fails, butpostgres-operator
treats it as a success (or maybe doesn't care as it plans to retry later anyway?) and kicks off a new base backup.The replica-replica situation may be admin error due to running
inplace_upgrade.py
manually before thepostgres-operator
-initiated base backup completed. The logs in the linked gist are from an attempt this morning where I waited until that basebackup completed before runninginplace_upgrade.py
manually. This seems to have completed successfully without ending up in the replica-replica state.