operator-framework / operator-lifecycle-manager

A management framework for extending Kubernetes with Operators
https://olm.operatorframework.io
Apache License 2.0
1.72k stars 545 forks source link

InstallPlan adding "replaces" field to CSV #2726

Open joaopauloksn opened 2 years ago

joaopauloksn commented 2 years ago

Bug Report

What did you do? I created a catalog image without any replace field in the CSV.


{"name":"ibm-truststore-mgr.v1.1.0","version":"1.1.0","replaces":"ibm-truststore-mgr.v1.0.0","skipRange":">1.0.0 <1.1.0","skips":null,"channelName":"1.x","bundlePath":"<registry-path>/ibm-truststore-mgr-operator-bundle@sha256:e17760f6b71f09dbaed848a948e9da485685221e1415b241d19d9a0b1c02ce29"}

{"name":"ibm-truststore-mgr.v1.2.0","version":"1.2.0","replaces":"ibm-truststore-mgr.v1.1.0","skipRange":">1.1.0 <1.2.0","skips":null,"channelName":"1.x","bundlePath":"<registry-path>/ibm-truststore-mgr-operator-bundle@sha256:2cd046c6636f4608ee8cb0335a9d227527e73fdd79571e8fbe05aedc42f25edc"}

{"name":"ibm-truststore-mgr.v1.2.2","version":"1.2.2","replaces":"ibm-truststore-mgr.v1.2.0","skipRange":">=1.2.0 <1.2.2","skips":null,"channelName":"1.x","bundlePath":"<registry-path>/ibm-truststore-mgr-operator-bundle@sha256:f0a8d46d2697e36f650246aff2023a6dfb7211b68a3addc17a2d7d1aadddbf04"}

{"name":"ibm-truststore-mgr.v1.3.0-pre.stable","version":"1.3.0-pre.stable","replaces":null,"skipRange":">=1.0.0 <=99.0.0","skips":null,"channelName":"stable","bundlePath":"<registry-path>/ibm-truststore-mgr-operator-bundle:latest-stable"}

{"name":"ibm-truststore-mgr.v1.3.0-pre.tnoppc","version":"1.3.0-pre.tnoppc","replaces":null,"skipRange":">=1.0.0 <=99.0.0","skips":null,"channelName":"tnoppc","bundlePath":"<registry-path>/ibm-truststore-mgr-operator-bundle:latest-tnoppc"}

tnoppc is the default channel and has no replaces in the CSV.

What did you expect to see? Subscription, CSV and installplan should all work as usual, installing the operator deployment.

What did you see instead? Under which circumstances? For some reason, installplan status field is showing a replaces field pointing to the same CSV name, causing it to be added to my CSV. It is causing a loop as the CSV cannot replace itself. This issue is intermittent though. Sometimes it just works and I don't see replaces line in the CSV spec, which is very weird. How can the same catalog/channel show different deployment behaviors? What did add replaces field in my CSV? It is clearly not there when I look at the installplan config map before approving it.

Environment

b3aabf273e0ac0bd6e84d257332e2eac08f5e6cf

Openshift 4.8: Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.6+b82a451", GitCommit:"cefce093e4e5bc9a1916eb5a489ed37c7d467f6f", GitTreeState:"clean", BuildDate:"2022-02-05T06:58:30Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}

Possible Solution Identify why some OLM component is adding replaces field to my CSV.

Additional context My install plan shows replaces field even though I don't have it in the CSV.

{"kind":"ConfigMap","name":"10f8e94f1384dd22e6072068ab641b53552f83704f22f271a7f156d4dd6c397","namespace":"openshift-marketplace","catalogSourceName":"ibm-truststore-mgr-operators","catalogSourceNamespace":"openshift-marketplace","replaces":"ibm-truststore-mgr.v1.3.0-pre.tnoppc","properties":"{\"properties\":[{\"type\":\"olm.gvk\",\"value\":{\"group\":\"truststore-mgr.ibm.com\",\"kind\":\"Truststore\",\"version\":\"v1\"}},{\"type\":\"olm.package\",\"value\":{\"packageName\":\"ibm-truststore-mgr\",\"version\":\"1.3.0-pre.tnoppc\"}}]}"}
terenceq commented 2 years ago

A few more details - and trying to be as generic as possible with the description. The problem with the replaces happens when we try to create a subscription that results in picking a release from a single release channel in our catalog. Nothing special about the subscription - for example.

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: ibm-truststore-mgr-stable-ibm-truststore-mgr-operators-openshift-marketplace
  namespace: ibm-sls
spec:
  channel: stable
  name: ibm-truststore-mgr
  source: ibm-truststore-mgr-operators
  sourceNamespace: openshift-marketplace
  installPlanApproval: Automatic

I then end up with this:

% oc get csv
NAME                                   DISPLAY                     VERSION            REPLACES                               PHASE
ibm-truststore-mgr.v1.2.3-pre.stable   IBM Truststore Manager      1.2.3-pre.stable   ibm-truststore-mgr.v1.2.3-pre.stable   Pending

The CSV ends up with a replaces attribute:

  provider:
    name: IBM
    url: https://ibm.com
  replaces: ibm-truststore-mgr.v1.2.3-pre.stable
  version: 1.2.3-pre.stable

The CSV named ibm-truststore-mgr.v1.2.3-pre.stable is in the state Pending. The only status condition is:

oc get csv ibm-truststore-mgr.v1.2.3-pre.stable -o yaml
status:
  cleanup: {}
  conditions:
  - lastTransitionTime: "2022-04-28T17:28:46Z"
    lastUpdateTime: "2022-04-28T17:28:46Z"
    message: requirements not yet checked
    phase: Pending
    reason: RequirementsUnknown
  lastTransitionTime: "2022-04-28T17:28:46Z"
  lastUpdateTime: "2022-04-28T17:28:46Z"
  message: requirements not yet checked
  phase: Pending
  reason: RequirementsUnknown

Looking at the logs for the olm-operator in the namespace openshift-operator-lifecycle-manager the following error messages are observed:

{"level":"error","ts":1651166928.3758118,"logger":"controllers.operatorcondition","msg":"Error ensuring OperatorCondition Deployment EnvVars","request":"ibm-sls/ibm-truststore-mgr.v1.2.3-pre.stable","error":"Deployment.apps \"ibm-truststore-mgr-controller-manager\" not found","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2\n\t/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:216\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:99"}
{"level":"error","ts":1651166928.3759263,"logger":"controller-runtime.manager.controller.operatorcondition","msg":"Reconciler error","reconciler group":"operators.coreos.com","reconciler kind":"OperatorCondition","name":"ibm-truststore-mgr.v1.2.3-pre.stable","namespace":"ibm-sls","error":"Deployment.apps \"ibm-truststore-mgr-controller-manager\" not found","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2\n\t/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:216\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\t/build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:99"}
time="2022-04-28T17:28:48Z" level=warning msg="Unable to replace previous CSV" csv=ibm-truststore-mgr.v1.2.3-pre.stable error="CSV being replaced is in phase Pending instead of Replacing" id=CKA7e namespace=ibm-sls phase=Pending
time="2022-04-28T17:28:49Z" level=warning msg="Unable to replace previous CSV" csv=ibm-truststore-mgr.v1.2.3-pre.stable error="CSV being replaced is in phase Pending instead of Replacing" id=oUMul namespace=ibm-sls phase=Pending
awgreene commented 2 years ago

Hello @terenceq, thanks for submitting this issue and for using OLM.

For some reason, installplan status field is showing a replaces field pointing to the same CSV name, causing it to be added to my CSV. It is causing a loop as the CSV cannot replace itself.... What did add replaces field in my CSV?

This is expected behavior.

When OLM is determining if an upgrade is available for an operator, it will look at the existing CSV and determine if:

If the existing CSV has an upgrade due to the second option, the newer CSV will be have its replaces field set to the existing CSV version. This allows OLM to use a single process for upgrading CSVs on cluster.

It is causing a loop as the CSV cannot replace itself.

This is happening because the skipRange your using is >=1.0.0 <=99.0.0, which is greater than the version of the CSV (v1.3.0-xxx). You need to set the skipRange to less than the version of the CSV. In this case, it seems like you should set the skipRange to >=1.0.0 <SEMVER.

This issue is intermittent though. Sometimes it just works and I don't see replaces line in the CSV spec, which is very weird. How can the same catalog/channel show different deployment behaviors?

The replaces field is only set during upgrades, I suspect you've seen a blank replaces field when installing the operator from scratch and are not upgrading from an existing version. If this is happening at other times, please share the steps to reproduce.

Note: Edited for clarity.