operator-framework / operator-lifecycle-manager

A management framework for extending Kubernetes with Operators
https://olm.operatorframework.io
Apache License 2.0
1.71k stars 544 forks source link

Defining env variables in Subscription creates two ReplicaSets #2725

Open ryanemerson opened 2 years ago

ryanemerson commented 2 years ago

Bug Report

What did you do? Create a Subscription with environment variables defined via spec.config.env:

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: my-infinispan
  namespace: namespace-for-testing
spec:
  channel: 2.2.x
  name: infinispan
  source: operatorhubio-catalog
  sourceNamespace: olm
  config:
    env:
      - name: example-env
        value: some-value

What did you expect to see? I expect the operator's deployment to be created with a single ReplicaSet containing the configured environment variables.

What did you see instead? Under which circumstances? The deployment was created with two ReplicaSet instances. The first ReplicaSet does not contain the defined environment variables, but the second does. Eventually only the second ReplicaSet remains in an active state and the operator pods have the environment variables as expected.

The problem is that while the first ReplicaSet is active, the Deployment's readiness probes can pass and the operator pods without the environment variables will start reconciling CR resources. This is problematic in our case, as we use env variables in the operator pods to define default annotation/labels that must be applied to CR resources. Therefore CRs reconciled when the first ReplicaSet is active will be missing these defaults.

Environment

v.0.20.0

v.1.21.1

kind

mbaldessari commented 3 days ago

We're hitting this same issue when installing the RHOAI operator. We add the following to our subscription:

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: rhods-operator
  namespace: redhat-ods-operator
spec:
  channel: stable
  installPlanApproval: Automatic
  name: rhods-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
  config:
    env:
    - name: DISABLE_DSC_CONFIG
      value: "True"

The above env variable is supposed to prevent the creation of a so-called default-dsci object. The result is, however, very unpredictable depending on timing.

The reason is that OLM creates the deployment without the DISABLE_DSC_CONFIG env variable first, the operator pod will be spawned without this env variable and this default-dsci object will be created. Only later the deployment gets patched and a new ReplicaSet gets triggered with the correct environment variable. At this point it is too late though since the object has already been created.