operator-framework / operator-lifecycle-manager

A management framework for extending Kubernetes with Operators
https://olm.operatorframework.io
Apache License 2.0
1.72k stars 545 forks source link

Unexpected CSV transition when old version in CRD is deprecated #3388

Open dinhxuanvu opened 1 month ago

dinhxuanvu commented 1 month ago

Bug Report

What did you do? When a new version of CR is introduced, the new version is marked as served: true while old version will be marked served: false in the CRD. When a version is marked as unserved, the endpoint for the version is no longer reachable. CRD should be the source of true of CR versioning for an operator. However, it is not the case for OLM given CSV contains available APIs information via CustomResourceDefinitions in the specs. These information will be checked and verified during requirement and permission check of olm-operator reconciliation loop. If those conditions are not met, CSV may be transitioned into Pending or Failed status. During upgrade, after the CRD is updated to mark older versions as unserved, those APIs are no longer available, the old CSV will be marked as Failed due to those endpoints are no longer available during requirement/permission checks. The new CSV is expecting the old CSV to be in Replacing phase in order to proceed with the upgrade process. The operator may be stuck in Pending stage forever if this situation is not resolved which leads to this test to permafail.

What did you expect to see?

What did you see instead? Under which circumstances?

Environment

Possible Solution Make changes to the CSV phase reconciliation loop to ensure old CSV transitioning into Replacing regardless if it passes requirements/permissions check if a new CSV is available during upgrade process.

A more complex solution would be getting rid of CRD information in CSV given it is no longer necessary for CSV to hold CRD information when CRD exists in bundle.

Additional context

dinhxuanvu commented 1 month ago

/assign @perdasilva