kyma-project / community

Provides general guidelines, contributing, and maintaining rules for all who add content to Kyma.
https://kyma-project.io/community/
Apache License 2.0
44 stars 108 forks source link

Module management improvements #870

Closed pbochynski closed 8 months ago

pbochynski commented 11 months ago

Created on 2023-12-18 by Piotr Bochynski (@pbochynski )

Decision log

Name Description
Title Module management improvements
Due date 2024-01-15
Status Proposed on 2023-12-20, Accepted on 2024-03-29
Decision type Choice
Affected decisions

Context

Modularization of Kyma components is completed. Now ech module release contains 2 artifacts:

These 2 artifacts can be installed directly by the user in any kubernetes cluster using kubectl apply command. Users that have access to the SAP Kyma Runtime (SKR) can decide to get the selected modules as a managed software, by adding the module name to the spec of Kyma custom resource in their SKR cluster. When module is listed in the Kyma CR, the central meta-operator called Kyma Lifecycle Manager (KLM) takes care of installing and upgrading the module operator to the version assigned to the preferred release channel.

flowchart TD
    S((not installed))
    I((installed))
    M((managed))
    D[delete]
    C{"do managed 
       objects 
       exist?"}
    V{check version}   
    U[install/update]
    KA[kubectl apply]
    KD[kubectl delete]
    CRA[add to Kyma CR]
    CRRD["remove from kyma CR
         delete strategy"]
    CRRI["remove from kyma CR
         ignore strategy"]
    S --> KA
    KA --> I
    I --> KD
    KD --> S
    S --> CRA
    CRA --> M
    I-->CRA
    CRRI --> I
    M --> CRRD
    M --> CRRI
    CRRD --> C
    C --> |yes|C
    C --> |no| D
    M --> |reconcile|V
    V -->|ok|M
    V --> |version not found|U
    U --> M
    D -->S

Decision

Following improvements are proposed to simplify and clarify module management domain in Kyma.

  1. As the module configuration is not managed KLM should not consider module configuration status in calculating Kyma CR status.
  2. KLM should block deletion of module operator deployment (manifest) until all managed resources are deleted (including module configuration). The list of managed resources contains at least the CRDs included in the module manifest (usually module configuration), but can include also other CRDs (like functions for serverless).
  3. KLM should not recreate module configuration as it is a managed resource that blocks module deletion (CreateAndDelete strategy should only create default config but not reconcile it later)
  4. With Ignore strategy KLM should not delete module deployment as it can lead to orphan resources with misleading status (module config with state Ready, but operator undeployed). With CreateAndDelete strategy the deletion process should be done with blocking strategy (do not delete operator manifest before all managed resources are gone - point no 2).
  5. Lifecycle management is not mandatory for SKR. Removing module from Kyma CR disables auto update but does not prevent users from installing it manually.

Consequences

  1. Kyma control plane components cannot get the information about module configuration status from Kyma CR. This problem is going to be resolved with https://github.com/kyma-project/lifecycle-manager/issues/1104
  2. Users should check module configuration directly instead of checking Kyma CR (part of the UI story: https://github.com/kyma-project/kyma/issues/18450)
  3. Module descriptor has to be enhanced with the information about managed custom resources.
  4. SLA constraints should be well documented. It should be clear what versions are supported (those present in the release channels), and smooth migration is offered only with KLM. Downgrades are not supported (remove and install previous version is the recommended approach).
janmedrek commented 11 months ago

Hey @pbochynski, the points 1-3 are clear and we've already agreed that it is the desired behaviour. I need some clarification regarding the deletion process of the modules.

From what I understand, the module that is listed in the Kyma CR (is managed) will have two deletion modes:

  1. Ignore - this just means that the module was removed from the Kyma CR and KLM does not take any further action. It just stops the reconciliation and all the resources are as they were.
  2. Delete - this means that KLM takes care of the full deletion process in which all the created and managed resources are to be removed from the cluster.

If yes - it would be great if you could expand point 4 with both of these options. It would be better to have it explicitly listed instead of having to deduce it from the diagram. 🙂

As a side note about the current implementation - right now, the strategy for the modules is in the Kyma CR modules list, which means that removing a module from Kyma CR also removes the information on which deletion strategy should be used. I guess we would need to move that somewhere else (replicate it to the status perhaps?).

What I was also worried about was the scenario in which:

What do you think about such a case? To me, this is a corner-case scenario, but I suppose that at some point it will be a support case that we need to solve.

pbochynski commented 11 months ago

@janmedrek I extended point no 4 about deletion with both strategies. The downgrade protection scenario you explained does not work well right now. I have a cluster that cannot be upgraded when I did the upgrade - downgrade - upgrade sequence. I think what really matters is the operator manager version, and KLM should compare the actual version (running in the SKR right now) with the one that is supposed to be installed from the release channel. This way it will work in any scenario - with/without KLM.

ptesny commented 11 months ago

What about having a reset or panic button to restore a given cluster modules to the initial set of these ?

pbochynski commented 11 months ago

What about having a reset or panic button to restore a given cluster modules to the initial set of these ?

We can mark default modules in the way people can recognize them in the UI. Anyway if you don't know what you want you can install all of them :)

pbochynski commented 8 months ago

Accepted.