If the upgrade is taking more than the timeout to get back in the cluster) the operator cannot recover even if the cluster is healthy. adding steps to recover once the cluster is completely upgraded (potentially manual work)
Change log description
Fix operator stuck in FailedUpgrade, even after the cluster is upgraded and healthy.
Purpose of the change
fixes #493
What the code does
Check if the Statefullset is fully upgrade when the operator is in UpgradeFailed mode, if the cluster is fully upgraded and healthy, remove the failed state and complete the upgrade
How to verify it
make sure that a node will not get online whit-in the 10 min timeout when doing an upgrade
verify that the cluster is stuck in UpgradeFailed and e.g. 2 out of 3 nodes ready
upgrade the operator to a build containing this fix, and verify that the upgrade completes.
obviously if you are already running this version, a failed upgrade vil recover once the cluster is in a good and upgraded state
If the upgrade is taking more than the timeout to get back in the cluster) the operator cannot recover even if the cluster is healthy. adding steps to recover once the cluster is completely upgraded (potentially manual work)
Change log description
Fix operator stuck in FailedUpgrade, even after the cluster is upgraded and healthy.
Purpose of the change
fixes #493
What the code does
Check if the Statefullset is fully upgrade when the operator is in UpgradeFailed mode, if the cluster is fully upgraded and healthy, remove the failed state and complete the upgrade
How to verify it
make sure that a node will not get online whit-in the 10 min timeout when doing an upgrade verify that the cluster is stuck in UpgradeFailed and e.g. 2 out of 3 nodes ready upgrade the operator to a build containing this fix, and verify that the upgrade completes. obviously if you are already running this version, a failed upgrade vil recover once the cluster is in a good and upgraded state