kyma-project / lifecycle-manager

Controller that manages the lifecycle of Kyma Modules in your cluster.
http://kyma-project.io
Apache License 2.0
10 stars 30 forks source link

Duplicate entries in SKR Kyma module list #1242

Closed Tomasz-Smelcerz-SAP closed 7 months ago

Tomasz-Smelcerz-SAP commented 8 months ago

Description

It has been observed that the user was able to configure duplicated entry in the .spec.modules list in an SKR cluster. The LM couldn't handle this condition nicely - the proper error was only visible in the LM logs, not in the Kyma status or an Event on the Kyma object. The Kyma CRD schema should prevent this from happenning, but apparently something went wrong.

Steps to reproduce

To be found, start with trying to add the same module twice.

Environment Type

Managed

Environment Info

Shoot name: https://api.c-5a489c4.stage.kyma.ondemand.com Kubernetes Version: Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.6", GitCommit:"741c8db18a52787d734cbe4795f0b4ad860906d6", GitTreeState:"clean", BuildDate:"2023-09-13T09:14:09Z", GoVersion:"go1.20.8", Compiler:"gc", Platform:"linux/amd64"}

Attachments

kyma-with-duplicated-module.txt kyma-with-duplicated-module.error.log.txt

c-pius commented 8 months ago

This issue can be reproduced via Busola. In a new cluster, Istio is enabled by default and when editing the KymaCR via Edit->YAML functionality, it is possible to add the same istio entry twice resulting in the following KymaCR:

Screenshot 2024-01-22 at 16 53 51 image

Resulting KymaCR ```yaml apiVersion: operator.kyma-project.io/v1beta2 kind: Kyma metadata: annotations: operator.kyma-project.io/owned-by: kcp-system/c316bc8d-2110-497e-86b1-8bcb568ee580 creationTimestamp: "2024-01-22T15:45:52Z" generation: 2 labels: operator.kyma-project.io/watched-by: lifecycle-manager name: default namespace: kyma-system resourceVersion: "14036" uid: 9ace2b53-1cd2-467a-a84a-c8a746f8b802 spec: channel: fast modules: - customResourcePolicy: CreateAndDelete name: api-gateway - customResourcePolicy: CreateAndDelete name: istio - customResourcePolicy: CreateAndDelete name: istio status: activeChannel: fast conditions: - lastTransitionTime: "2024-01-22T15:53:25Z" message: all modules are in ready state observedGeneration: 1 reason: Ready status: "True" type: Modules - lastTransitionTime: "2024-01-22T15:53:25Z" message: module templates are synchronized observedGeneration: 1 reason: Ready status: "True" type: ModuleCatalog - lastTransitionTime: "2024-01-22T15:53:25Z" message: skrwebhook is synchronized observedGeneration: 1 reason: Ready status: "True" type: SKRWebhook lastOperation: lastUpdateTime: "2024-01-22T15:53:25Z" operation: kyma is ready modules: - channel: fast fqdn: kyma-project.io/module/istio manifest: apiVersion: operator.kyma-project.io/v1beta2 kind: Manifest metadata: generation: 1 name: c316bc8d-2110-497e-86b1-8bcb568ee580-istio-4034478325 namespace: kcp-system name: istio resource: apiVersion: operator.kyma-project.io/v1alpha2 kind: Istio metadata: name: default namespace: kyma-system state: Ready template: apiVersion: operator.kyma-project.io/v1beta2 kind: ModuleTemplate metadata: generation: 2 name: istio-fast namespace: kcp-system version: 1.2.1 - channel: fast fqdn: kyma-project.io/module/api-gateway manifest: apiVersion: operator.kyma-project.io/v1beta2 kind: Manifest metadata: generation: 1 name: c316bc8d-2110-497e-86b1-8bcb568ee580-api-gateway-1209152227 namespace: kcp-system name: api-gateway resource: apiVersion: operator.kyma-project.io/v1alpha1 kind: APIGateway metadata: name: default namespace: "" state: Ready template: apiVersion: operator.kyma-project.io/v1beta2 kind: ModuleTemplate metadata: generation: 3 name: api-gateway-fast namespace: kcp-system version: 2.1.0 state: Ready ```

The same cannot be achieved when editing KymaCR via kubectl apply, kubectl edit, or kyma alpha enable module (see also tests on local k3d cluster below).

The PR that updated the enforcement of unique values is here https://github.com/kyma-project/lifecycle-manager/pull/611. This seems to work as expected for kubectl and kyma cli, but Busola's Edit-YAML functionality somehow seems to bypass it.

A Kyma installation where this issue is reproduced is in cs-test-1242 Subaccount under our Stage Global Account.

Tests on local k3d cluster

Given the following KymaCR:

apiVersion: operator.kyma-project.io/v1beta2
kind: Kyma
metadata:
  annotations:
    operator.kyma-project.io/owned-by: kcp-system/kyma-sample
  labels:
    operator.kyma-project.io/watched-by: lifecycle-manager
  name: default
  namespace: kyma-system
spec:
  channel: regular
  modules:
  - channel: regular
    customResourcePolicy: CreateAndDelete
    name: template-operator

Further Observations

KymaCR with duplicated module under alternative names ```yaml Name: default Namespace: kyma-system Labels: operator.kyma-project.io/watched-by=lifecycle-manager Annotations: operator.kyma-project.io/owned-by: kcp-system/kyma-sample API Version: operator.kyma-project.io/v1beta2 Kind: Kyma Metadata: Creation Timestamp: 2024-01-19T14:56:17Z Generation: 10 Resource Version: 12605 UID: 7a30e000-8d9d-451a-afa1-b46e605336bf Spec: Channel: regular Modules: Custom Resource Policy: CreateAndDelete Name: kyma-project.io/module/template-operator Custom Resource Policy: CreateAndDelete Name: template-operator Status: Active Channel: regular Conditions: Last Transition Time: 2024-01-22T09:16:13Z Message: all modules are in ready state Observed Generation: 1 Reason: Ready Status: True Type: Modules Last Transition Time: 2024-01-22T09:16:13Z Message: module templates are synchronized Observed Generation: 1 Reason: Ready Status: True Type: ModuleCatalog Last Transition Time: 2024-01-22T09:16:13Z Message: skrwebhook is synchronized Observed Generation: 1 Reason: Ready Status: True Type: SKRWebhook Last Operation: Last Update Time: 2024-01-22T09:16:13Z Operation: kyma is ready Modules: Channel: regular Fqdn: kyma-project.io/module/template-operator Manifest: API Version: operator.kyma-project.io/v1beta2 Kind: Manifest Metadata: Generation: 1 Name: kyma-sample-template-operator-1125230537 Namespace: kcp-system Name: kyma-project.io/module/template-operator Resource: API Version: operator.kyma-project.io/v1alpha1 Kind: Sample Metadata: Name: sample-yaml Namespace: kyma-system State: Ready Template: API Version: operator.kyma-project.io/v1beta2 Kind: ModuleTemplate Metadata: Generation: 2 Name: template-operator-regular Namespace: kcp-system Version: 0.1.0 Channel: regular Fqdn: kyma-project.io/module/template-operator Manifest: API Version: operator.kyma-project.io/v1beta2 Kind: Manifest Metadata: Generation: 1 Name: kyma-sample-template-operator-2235966007 Namespace: kcp-system Name: template-operator Resource: API Version: operator.kyma-project.io/v1alpha1 Kind: Sample Metadata: Name: sample-yaml Namespace: kyma-system State: Ready Template: API Version: operator.kyma-project.io/v1beta2 Kind: ModuleTemplate Metadata: Generation: 2 Name: template-operator-regular Namespace: kcp-system Version: 0.1.0 State: Ready Events: ```
c-pius commented 8 months ago

From network traffic, it is observable that Busola is sending the following request:

PATCH https://<host>/v1beta1/namespaces/kyma-system/kymas/default
Content-Type: application/json-patch+json

[
  {
    "op": "add",
    "path": "/spec/modules/1",
    "value": {
      "customResourcePolicy": "CreateAndDelete",
      "name": "istio"
    }
  }
]

Tried to re-produce the same with kubectl -n kyma-system patch kyma/default --type=json --patch-file ./patch-kyma.json. However, validation does kick in resulting in: The Kyma "default" is invalid: spec.modules[1]: Duplicate value: map[string]interface {}{"name":"template-operator"}

Edit: observed the behavior that if the modules list already contains a duplicate entry, it is possible that patch a third duplicate entry in via kubectl patch. This may be related to https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#validation-ratcheting

c-pius commented 8 months ago

Noticed that the request from network traffic uses v1beta1. Validations have been introduced to CRD with version v1beta2. Firing requests against that version will fail validation, e.g.:

PATCH http://localhost:8001/apis/operator.kyma-project.io/v1beta2/namespaces/kyma-system/kymas/default
Content-Type: application/json-patch+json
Accept: application/json

[
  {
    "op": "add",
    "path": "/spec/modules/1",
    "value": {
      "customResourcePolicy": "CreateAndDelete",
      "name": "template-operator"
    }
  }
]

Will lead to:

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "Kyma.operator.kyma-project.io \"default\" is invalid: spec.modules[1]: Duplicate value: map[string]interface {}{\"name\":\"template-operator\"}",
  "reason": "Invalid",
  "details": {
    "name": "default",
    "group": "operator.kyma-project.io",
    "kind": "Kyma",
    "causes": [
      {
        "reason": "FieldValueDuplicate",
        "message": "Duplicate value: map[string]interface {}{\"name\":\"template-operator\"}",
        "field": "spec.modules[1]"
      }
    ]
  },
  "code": 422
}

However, the behavior is the same as described in the EDIT above. If the resource is already invalid, the patch will go through.

c-pius commented 8 months ago

The following topics are left to be clarified/discussed:

1) do we need to ask stakeholders to migrate to v1beta2? ✅ 1) do we need to "back-port" the validation to v1beta1? :x: 1) do we need to consider the "edit on already invalid resource" case? :x: not for now, we keep it in the back of our heads 1) do we need to handle the "alternative ways of naming the module " case? ✅ => https://github.com/kyma-project/lifecycle-manager/issues/1276

janmedrek commented 8 months ago

@c-pius I wouldn't back-port any changes to v1beta1 as we anyway want to migrate away from that version.

janmedrek commented 7 months ago

Busola config will be updated in the https://github.com/kyma-project/kyma-dashboard/pull/272, big kudos to @c-pius for the investigation!

janmedrek commented 7 months ago

I have already reminded the teams to migrate, v1beta2 was introduced almost 9 months ago. We communicated a new version back then, I suppose this is a feasible time to migrate dependencies.