What did you do?
When a new version of CR is introduced, the new version is marked as served: true while old version will be marked served: false in the CRD. When a version is marked as unserved, the endpoint for the version is no longer reachable. CRD should be the source of true of CR versioning for an operator. However, it is not the case for OLM given CSV contains available APIs information via CustomResourceDefinitions in the specs. These information will be checked and verified during requirement and permission check of olm-operator reconciliation loop. If those conditions are not met, CSV may be transitioned into Pending or Failed status. During upgrade, after the CRD is updated to mark older versions as unserved, those APIs are no longer available, the old CSV will be marked as Failed due to those endpoints are no longer available during requirement/permission checks. The new CSV is expecting the old CSV to be in Replacing phase in order to proceed with the upgrade process. The operator may be stuck in Pending stage forever if this situation is not resolved which leads to this test to permafail.
What did you expect to see?
What did you see instead? Under which circumstances?
Environment
operator-lifecycle-manager version:
Latest
Kubernetes version information:
1.31
Kubernetes cluster kind:
Possible Solution
Make changes to the CSV phase reconciliation loop to ensure old CSV transitioning into Replacing regardless if it passes requirements/permissions check if a new CSV is available during upgrade process.
A more complex solution would be getting rid of CRD information in CSV given it is no longer necessary for CSV to hold CRD information when CRD exists in bundle.
Bug Report
What did you do? When a new version of CR is introduced, the new version is marked as
served: true
while old version will be markedserved: false
in the CRD. When a version is marked as unserved, the endpoint for the version is no longer reachable. CRD should be the source of true of CR versioning for an operator. However, it is not the case for OLM given CSV contains available APIs information via CustomResourceDefinitions in the specs. These information will be checked and verified during requirement and permission check of olm-operator reconciliation loop. If those conditions are not met, CSV may be transitioned into Pending or Failed status. During upgrade, after the CRD is updated to mark older versions as unserved, those APIs are no longer available, the old CSV will be marked as Failed due to those endpoints are no longer available during requirement/permission checks. The new CSV is expecting the old CSV to be inReplacing
phase in order to proceed with the upgrade process. The operator may be stuck in Pending stage forever if this situation is not resolved which leads to this test to permafail.What did you expect to see?
What did you see instead? Under which circumstances?
Environment
operator-lifecycle-manager version: Latest
Kubernetes version information: 1.31
Kubernetes cluster kind:
Possible Solution Make changes to the CSV phase reconciliation loop to ensure old CSV transitioning into
Replacing
regardless if it passes requirements/permissions check if a new CSV is available during upgrade process.A more complex solution would be getting rid of CRD information in CSV given it is no longer necessary for CSV to hold CRD information when CRD exists in bundle.
Additional context