operator-framework / operator-controller

A new and improved management framework for extending Kubernetes with Operators
https://operator-framework.github.io/operator-controller/
Apache License 2.0
70 stars 54 forks source link

Resolution fails when manually upgrading through legacy replaces chain beyond n+1 #1009

Closed everettraven closed 4 months ago

everettraven commented 4 months ago

When attempting to manually step through upgrading versions of a package using the ClusterExtension using the replaces chain in the FBC channel, moving beyond the first replaced version results in the following status condition:

    - lastTransitionTime: "2024-07-03T19:44:12Z"
      message: 'error upgrading from currently installed version "0.1.0": no package
        "argocd-operator" matching version "0.3.0" in channel "alpha" found'
      observedGeneration: 4
      reason: ResolutionFailed
      status: "False"
      type: Resolved

In this case, I had followed the replaces chain from v0.1.0 --> v0.2.0 successfully. When attempting to go from v0.2.0 --> v0.3.0 the upgrade failed and contained the above resolution failure.

To verify this is a valid upgrade path, you can see the channel and upgrade edges with:

docker run --rm -it quay.io/operator-framework/opm:latest render quay.io/operatorhubio/catalog:latest | jq -s '.[] | select( .schema == "olm.channel" ) | select( .package == "argocd-operator")'
Output ```json { "schema": "olm.channel", "name": "alpha", "package": "argocd-operator", "entries": [ { "name": "argocd-operator.v0.0.11", "replaces": "argocd-operator.v0.0.9" }, { "name": "argocd-operator.v0.0.12", "replaces": "argocd-operator.v0.0.11" }, { "name": "argocd-operator.v0.0.13", "replaces": "argocd-operator.v0.0.12" }, { "name": "argocd-operator.v0.0.14", "replaces": "argocd-operator.v0.0.13" }, { "name": "argocd-operator.v0.0.15", "replaces": "argocd-operator.v0.0.14" }, { "name": "argocd-operator.v0.0.2" }, { "name": "argocd-operator.v0.0.3", "replaces": "argocd-operator.v0.0.2" }, { "name": "argocd-operator.v0.0.4", "replaces": "argocd-operator.v0.0.3" }, { "name": "argocd-operator.v0.0.5", "replaces": "argocd-operator.v0.0.4" }, { "name": "argocd-operator.v0.0.6", "replaces": "argocd-operator.v0.0.5" }, { "name": "argocd-operator.v0.0.8", "replaces": "argocd-operator.v0.0.6" }, { "name": "argocd-operator.v0.0.9", "replaces": "argocd-operator.v0.0.8" }, { "name": "argocd-operator.v0.1.0", "replaces": "argocd-operator.v0.0.15" }, { "name": "argocd-operator.v0.10.0", "replaces": "argocd-operator.v0.9.2" }, { "name": "argocd-operator.v0.10.1", "replaces": "argocd-operator.v0.10.0" }, { "name": "argocd-operator.v0.2.0", "replaces": "argocd-operator.v0.1.0" }, { "name": "argocd-operator.v0.2.1", "replaces": "argocd-operator.v0.2.0" }, { "name": "argocd-operator.v0.3.0", "replaces": "argocd-operator.v0.2.1" }, { "name": "argocd-operator.v0.4.0", "replaces": "argocd-operator.v0.3.0" }, { "name": "argocd-operator.v0.5.0", "replaces": "argocd-operator.v0.4.0" }, { "name": "argocd-operator.v0.6.0", "replaces": "argocd-operator.v0.5.0" }, { "name": "argocd-operator.v0.7.0", "replaces": "argocd-operator.v0.6.0" }, { "name": "argocd-operator.v0.8.0", "replaces": "argocd-operator.v0.7.0" }, { "name": "argocd-operator.v0.9.0", "replaces": "argocd-operator.v0.8.0" }, { "name": "argocd-operator.v0.9.1", "replaces": "argocd-operator.v0.9.0" }, { "name": "argocd-operator.v0.9.2", "replaces": "argocd-operator.v0.9.1" } ] } ```
Full ClusterExtension output YAML ```yaml apiVersion: olm.operatorframework.io/v1alpha1 kind: ClusterExtension metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"olm.operatorframework.io/v1alpha1","kind":"ClusterExtension","metadata":{"annotations":{},"name":"argocd"},"spec":{"channel":"alpha","installNamespace":"default","packageName":"argocd-operator","preflight":{"crdUpgradeSafety":{"disabled":true}},"version":"0.3.0"}} creationTimestamp: "2024-07-03T19:41:50Z" finalizers: - olm.operatorframework.io/cleanup-unpack-cache - olm.operatorframework.io/delete-cached-bundle generation: 4 name: argocd resourceVersion: "1979" uid: 86f0b4a0-d6a7-4856-8aa2-e2d39e40b2a9 spec: channel: alpha installNamespace: default packageName: argocd-operator preflight: crdUpgradeSafety: disabled: true upgradeConstraintPolicy: Enforce version: 0.3.0 status: conditions: - lastTransitionTime: "2024-07-03T19:41:52Z" message: "" observedGeneration: 3 reason: Deprecated status: "False" type: Deprecated - lastTransitionTime: "2024-07-03T19:41:52Z" message: "" observedGeneration: 3 reason: Deprecated status: "False" type: PackageDeprecated - lastTransitionTime: "2024-07-03T19:41:52Z" message: "" observedGeneration: 3 reason: Deprecated status: "False" type: ChannelDeprecated - lastTransitionTime: "2024-07-03T19:41:52Z" message: "" observedGeneration: 3 reason: Deprecated status: "False" type: BundleDeprecated - lastTransitionTime: "2024-07-03T19:44:12Z" message: 'error upgrading from currently installed version "0.1.0": no package "argocd-operator" matching version "0.3.0" in channel "alpha" found' observedGeneration: 4 reason: ResolutionFailed status: "False" type: Resolved - lastTransitionTime: "2024-07-03T19:41:54Z" message: 'unpack successful: ' observedGeneration: 3 reason: UnpackSuccess status: "True" type: Unpacked - lastTransitionTime: "2024-07-03T19:43:53Z" message: Instantiated bundle argocd successfully observedGeneration: 3 reason: Success status: "True" type: Installed ``` Just to note, this was originally found by OpenShift QE. I verified the bug was reproducible and used a different package for installation.
joelanford commented 4 months ago

Definitely a blocker, IMO. Any idea why this is happening? My initial suspicion is that we somehow have an incorrect understanding of the currently installed version?

everettraven commented 4 months ago

Definitely a blocker, IMO. Any idea why this is happening? My initial suspicion is that we somehow have an incorrect understanding of the currently installed version?

That is my suspicion as well. It seems like https://github.com/operator-framework/operator-controller/blob/8bf5cf45e4c128ba221d4085099928220ee367fb/internal/controllers/clusterextension_controller.go#L426 may be returning only the initially installed version

kevinrizza commented 4 months ago

It's happening because the helm release for the upgraded bundle still has the first version label. When we install we set the version label:

https://github.com/operator-framework/operator-controller/blob/7cc9872805a96d96de40797175397d6dd745bd1e/internal/controllers/clusterextension_controller.go#L355-L372

but we're not doing the same for the upgrade case.