thetechnick commented 3 years ago

Problem Description

Informing OLM about the status of a deployed operator is very important to ensure complex migrations and procedures are not interrupted. While interruptions/operator pod restarts should not lead to issues, upgrading an Operator during a complex migration could lead to inconsistencies when the Operator cannot pick up the migration where it has left of.

At time of writing OLM already allows Operators to report Status information via the OperatorCondition CRD. https://olm.operatorframework.io/docs/advanced-tasks/communicating-operator-conditions-to-olm/

Context: I am submitting this issue for consideration as developer on the Hyperconverged Cluster Operator (HCO from now on). Before joining this team though, I developed multiple operators that would run into the same issues.

Consistency

A common pattern, especially for cluster-wide operators is the use of a singleton CRD instance to hold configuration for the operator itself. This is also the place where information about the operators status (conditions, events, etc) are recorded and exposed to users. In some cases operators require the consultation of this status before operators (human or otherwise) upgrade the operator itself. ^{[citation needed]}

When those operators don't support managing multiple Thing Instances on the cluster, they may merge configuration of the Thing they are managing as a cluster-wide deployment into their operator configuration. This is also the case for HCO.

Now to the core issue:

# (simplified)
# Cool Operator is deploying complex cool stuff in a cluster
# This object instance is 
apiVersion: cool.operators.corp/v1beta1
kind: CoolOperator
metadata:
  name: cool-operator
  generation: 15
spec:
  # Changing this Setting will migrate all components between PostgreSQL and MariaDB,
  # a complex and somewhat long running migration in between the Operator must not be upgraded.
  storageLayer: PostgreSQL
status:
  conditions:
  - type: Upgradable
    status: "True"
    reason: OperatorIdle
    message: "operator is chilling nothing to coordinate right now"
    observedGeneration: 15

Important here is generation/observedGeneration. As soon as "someone" is making a spec change, the condition is invalidated until the controller could report on the new state. To illustrate:

apiVersion: cool.operators.corp/v1beta1
kind: CoolOperator
metadata:
  name: cool-operator
  generation: 16
spec:
  storageLayer: MariaDB
status:
  conditions:
  - type: Upgradable
    status: "True"
    reason: OperatorIdle
    message: "operator is chilling nothing to coordinate right now"
    observedGeneration: 15 # outdated status

  # Condition update after the controller was able to service the new object version.
  - type: Upgradable
    status: "False"
    reason: MigrationInProgress
    message: "data migration is in progress - don't update this operator! - go away and drink another coffee :)"
    observedGeneration: 16

Why this is so convenient: Without doing any complex gymnastics, I as a developer can be sure, that until my code confirms that the new spec is not changing (or is changing) conditions, nobody should act on outdated information.

Mirroring this status information into another object makes synchronization way more difficult and introduces a few pitfalls.

before acting on .Spec the operator needs to check whether ANY change could be triggered and lock the Upgradable condition, introducing a huge complexity burden on the operator.
the operator can't know whether OLM has seen the update to OperatorCondition, so as a developer I can't be sure that the "lock" is committed. Making relying on it flaky.

^{*[citation needed]} Searching for an example, but the Runbook of the operator I have in mind is not public. :)

Deployment complexity

We want upstream operators to still work on Kubernetes, so we have support 3 different deployment models:

OpenShift with OLM
Kubernetes with OLM
Kubernetes without OLM

This makes it even harder to upstream and test changes.

Double writes to kube-apiserver/etcd

We can't get rid of our own CRD, as it holds our configuration and we need it for upstream. That means we have to update 2 objects for every condition change.

Proposed Solution

I would like to propose to have OLM discover and read status conditions from those CRDs that are already in use by operators. (CoolOperator CRD in the example under Consistency)

In the concrete, I would like to be able to tell OLM the name and type of a CRD instance to check the status conditions, as alternative to reporting them on the OperatorCondition object.

In case of HCO operatorframework.io/initialization-resource is even used to create that instance in the first place.

Thanks for reading! Looking forward discussing this in more detail.

ecordell commented 3 years ago

Thanks for the submission and detailed writeup @thetechnick

I would like to propose to have OLM discover and read status conditions from those CRDs that are already in use by operators.

This was a part of the original enhancement that was dropped only for time. Perhaps @awgreene has the older versions of that enhancement handy, so that we could re-instate those parts as future work (perhaps as a separate enhancement?)

thetechnick commented 3 years ago

Hey @ecordell if you can point me towards existing groundwork, I would be happy to contribute to this feature.

thetechnick commented 3 years ago

I was thinking a bit more about locking flows, because I would like to have an Upgrade lock that I can rely on.

EDIT head hurts from thinking about distributed logging for too long: To boil it down as much as possible, I can think of 2 paths:

Operators need to acquire a ACKed Upgrade Lock from OLM (via OperatorCondition or something else) or
OLM needs to acquire a lock on the Operator which is implicitly released when the Operator is upgraded.

Just doing a brain-dump here for future reference:

Operator locks OLM via OperatorCondition:

kind: OperatorCondition
metadata:
  generation: 4
spec:
  conditions:
  # reported from Operator
  # in .spec to increment .metadata.generation
  - type: Upgradable
    status: "True"
status:
  conditions:
  # OLM Acknoleged the Upgrade Lock
  - type: Acknowledged
    status: "True"
    observedGeneration: 4

  # Outdated Upgrade Lock
  - type: Acknowledged
    status: "True"
    observedGeneration: 3

  # Denied Upgrade Lock
  - type: Acknowledged
    status: "False"
    reason: UpgradeInProgress
    message: an upgrade process for the operator has already started
    observedGeneration: 4

Pro:

reuses OperatorCondition
minimum implementation overhead in OLM

Con:

huge complexity burden on operators

OLM locks Upgrade Ack from Operator

# example singleton Operator CRD kind
kind: HyperConverged
metadata:
  generation: 5
  annotations:
    # OLM updates this field to ask
    # wether it's safe to upgrade
    # updates here will bump .metadata.generation
    #
    # OLM needs to wait for .status.observedGeneration or
    # .status.condition[i].observedGeneration to sync up.
    #
    # integer to just trigger reconcile/condition update
    operator-sdk.io/can-upgrade: "5"
spec: {}
status:
  conditions:
  # Operator ack-ed upgrade is Ok
  - type: Upgradable
    status: "True"
    observedGeneration: 5

  # Outdated upgrade ok
  - type: Upgradable
    status: "True"
    observedGeneration: 3

  # Upgrade not ok right now
  - type: Upgradable
    status: "False"
    reason: BlockingVMMigration
    message: coordinating fleet VM migration, please wait...
    observedGeneration: 5

Pro:

no/minimal implementation in Operator
doesn't use existing mechanisms

Con:

needs configuration in CSV to enable? (Opt In)

operator-framework / operator-lifecycle-manager

Decouple Status reporting to OLM #1973

Problem Description

Consistency

Deployment complexity

Double writes to kube-apiserver/etcd

Proposed Solution

Operator locks OLM via OperatorCondition:

OLM locks Upgrade Ack from Operator