kyma-project / lifecycle-manager

Controller that manages the lifecycle of Kyma Modules in your cluster.
http://kyma-project.io
Apache License 2.0
10 stars 30 forks source link

Lifecycle-manager cannot patch modified resource correctly #1545

Closed k15r closed 4 months ago

k15r commented 6 months ago

Description

In my example a deployment managed by the lifecycle manager was modified a user directly using client side apply. The original resource looked similar to:

eventing-manager-1.1.0.yaml ```yaml apiVersion: apps/v1 kind: Deployment metadata: labels: app.kubernetes.io/component: eventing-manager app.kubernetes.io/created-by: eventing-manager app.kubernetes.io/instance: eventing-manager app.kubernetes.io/managed-by: kustomize app.kubernetes.io/name: eventing-manager app.kubernetes.io/part-of: Kyma control-plane: eventing-manager name: eventing-manager namespace: kyma-system spec: replicas: 1 selector: matchLabels: app.kubernetes.io/component: eventing-manager app.kubernetes.io/instance: eventing-manager app.kubernetes.io/name: eventing-manager control-plane: eventing-manager template: metadata: annotations: kubectl.kubernetes.io/default-container: manager traffic.sidecar.istio.io/excludeInboundPorts: "9443" labels: app.kubernetes.io/component: eventing-manager app.kubernetes.io/instance: eventing-manager app.kubernetes.io/name: eventing-manager control-plane: eventing-manager spec: containers: - command: - /manager env: - name: NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace - name: EVENTING_CR_NAME value: eventing - name: EVENTING_CR_NAMESPACE value: kyma-system - name: NATS_URL value: eventing-nats.kyma-system.svc.cluster.local - name: PUBLISHER_REQUESTS_CPU value: 10m - name: PUBLISHER_REQUESTS_MEMORY value: 64Mi - name: PUBLISHER_LIMITS_CPU value: 100m - name: PUBLISHER_LIMITS_MEMORY value: 128Mi - name: PUBLISHER_IMAGE value: europe-docker.pkg.dev/kyma-project/prod/eventing-publisher-proxy:1.0.1 - name: PUBLISHER_IMAGE_PULL_POLICY value: IfNotPresent - name: PUBLISHER_REPLICAS value: "1" - name: PUBLISHER_REQUEST_TIMEOUT value: 10s - name: DEFAULT_MAX_IN_FLIGHT_MESSAGES value: "10" - name: DEFAULT_DISPATCHER_RETRY_PERIOD value: 5m - name: DEFAULT_DISPATCHER_MAX_RETRIES value: "10" - name: APP_LOG_FORMAT value: json - name: APP_LOG_LEVEL value: info - name: JS_STREAM_NAME value: sap - name: JS_STREAM_SUBJECT_PREFIX value: kyma - name: JS_STREAM_STORAGE_TYPE value: file - name: JS_STREAM_REPLICAS value: "1" - name: JS_STREAM_DISCARD_POLICY value: new - name: JS_STREAM_RETENTION_POLICY value: interest - name: JS_CONSUMER_DELIVER_POLICY value: new - name: JS_STREAM_MAX_MSGS value: "-1" - name: JS_STREAM_MAX_BYTES value: 700Mi - name: WEBHOOK_SECRET_NAME value: eventing-manager-webhook-server-cert - name: MUTATING_WEBHOOK_NAME value: subscription-mutating-webhook-configuration - name: VALIDATING_WEBHOOK_NAME value: subscription-validating-webhook-configuration - name: EVENTING_WEBHOOK_AUTH_SECRET_NAME value: eventing-webhook-auth - name: EVENTING_WEBHOOK_AUTH_SECRET_NAMESPACE value: kyma-system image: europe-docker.pkg.dev/kyma-project/prod/eventing-manager:1.1.0 imagePullPolicy: Always livenessProbe: httpGet: path: /healthz port: 8081 initialDelaySeconds: 15 periodSeconds: 20 name: manager readinessProbe: httpGet: path: /readyz port: 8081 initialDelaySeconds: 5 periodSeconds: 10 resources: limits: cpu: 500m memory: 512Mi requests: cpu: 10m memory: 128Mi securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL volumeMounts: - mountPath: /tmp/k8s-webhook-server/serving-certs name: cert readOnly: true priorityClassName: eventing-manager-priority-class securityContext: fsGroup: 10001 runAsGroup: 10001 runAsNonRoot: true runAsUser: 10001 seccompProfile: type: RuntimeDefault serviceAccountName: eventing-manager terminationGracePeriodSeconds: 10 volumes: - name: cert secret: defaultMode: 420 secretName: eventing-manager-webhook-server-cert ```

with a new release of the eventing-manager the deployment should look like this:

eventing-manager-1.2.0.yaml ```yaml apiVersion: apps/v1 kind: Deployment metadata: labels: app.kubernetes.io/component: eventing-manager app.kubernetes.io/created-by: eventing-manager app.kubernetes.io/instance: eventing-manager app.kubernetes.io/managed-by: kustomize app.kubernetes.io/name: eventing-manager app.kubernetes.io/part-of: Kyma control-plane: eventing-manager name: eventing-manager namespace: kyma-system spec: replicas: 1 selector: matchLabels: app.kubernetes.io/component: eventing-manager app.kubernetes.io/instance: eventing-manager app.kubernetes.io/name: eventing-manager control-plane: eventing-manager template: metadata: annotations: kubectl.kubernetes.io/default-container: manager traffic.sidecar.istio.io/excludeInboundPorts: "9443" labels: app.kubernetes.io/component: eventing-manager app.kubernetes.io/instance: eventing-manager app.kubernetes.io/name: eventing-manager control-plane: eventing-manager spec: containers: - command: - /manager env: - name: NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace - name: EVENTING_CR_NAME value: eventing - name: EVENTING_CR_NAMESPACE value: kyma-system - name: NATS_URL value: eventing-nats.kyma-system.svc.cluster.local - name: PUBLISHER_REQUESTS_CPU value: 10m - name: PUBLISHER_REQUESTS_MEMORY value: 64Mi - name: PUBLISHER_LIMITS_CPU value: 100m - name: PUBLISHER_LIMITS_MEMORY value: 128Mi - name: PUBLISHER_IMAGE value: europe-docker.pkg.dev/kyma-project/prod/eventing-publisher-proxy:1.0.1 - name: PUBLISHER_IMAGE_PULL_POLICY value: IfNotPresent - name: PUBLISHER_REPLICAS value: "1" - name: PUBLISHER_REQUEST_TIMEOUT value: 10s - name: DEFAULT_MAX_IN_FLIGHT_MESSAGES value: "10" - name: DEFAULT_DISPATCHER_RETRY_PERIOD value: 5m - name: DEFAULT_DISPATCHER_MAX_RETRIES value: "10" - name: APP_LOG_FORMAT value: json - name: APP_LOG_LEVEL value: info - name: JS_STREAM_NAME value: sap - name: JS_STREAM_SUBJECT_PREFIX value: kyma - name: JS_STREAM_STORAGE_TYPE value: file - name: JS_STREAM_REPLICAS value: "1" - name: JS_STREAM_DISCARD_POLICY value: new - name: JS_STREAM_RETENTION_POLICY value: interest - name: JS_CONSUMER_DELIVER_POLICY value: new - name: JS_STREAM_MAX_MSGS value: "-1" - name: JS_STREAM_MAX_BYTES value: 700Mi - name: EVENTING_WEBHOOK_AUTH_SECRET_NAME value: eventing-webhook-auth - name: EVENTING_WEBHOOK_AUTH_SECRET_NAMESPACE value: kyma-system image: europe-docker.pkg.dev/kyma-project/prod/eventing-manager:1.2.0 imagePullPolicy: Always livenessProbe: httpGet: path: /healthz port: 8081 initialDelaySeconds: 15 periodSeconds: 20 name: manager readinessProbe: httpGet: path: /readyz port: 8081 initialDelaySeconds: 5 periodSeconds: 10 resources: limits: cpu: 500m memory: 512Mi requests: cpu: 10m memory: 128Mi securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL priorityClassName: eventing-manager-priority-class securityContext: fsGroup: 10001 runAsGroup: 10001 runAsNonRoot: true runAsUser: 10001 seccompProfile: type: RuntimeDefault serviceAccountName: eventing-manager terminationGracePeriodSeconds: 10 ```

The only difference between the two deployments is that the entire mount section is modified as there is no need for the webhook secret anymore:

1.1.0 -> 1.2.0:

91,96d90
<         - name: WEBHOOK_SECRET_NAME
<           value: eventing-manager-webhook-server-cert
<         - name: MUTATING_WEBHOOK_NAME
<           value: subscription-mutating-webhook-configuration
<         - name: VALIDATING_WEBHOOK_NAME
<           value: subscription-validating-webhook-configuration
101c95
<         image: europe-docker.pkg.dev/kyma-project/prod/eventing-manager:1.1.0
---
>         image: europe-docker.pkg.dev/kyma-project/prod/eventing-manager:1.2.0
128,131d121
<         volumeMounts:
<         - mountPath: /tmp/k8s-webhook-server/serving-certs
<           name: cert
<           readOnly: true
142,146d131
<       volumes:
<       - name: cert
<         secret:
<           defaultMode: 420
<           secretName: eventing-manager-webhook-server-cert

On one cluster the original 1.1.0 deployment was apparently modified using client side apply, as can be seen in the managed fields section of the now broken deployment:

Managed field after applying 1.2.0 onto 1.1.0 on a cluster with client side applied 1.1.0 ```yaml managedFields: - apiVersion: apps/v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: f:operator.kyma-project.io/managed-by-reconciler-disclaimer: {} f:labels: f:app.kubernetes.io/component: {} f:app.kubernetes.io/created-by: {} f:app.kubernetes.io/instance: {} f:app.kubernetes.io/managed-by: {} f:app.kubernetes.io/name: {} f:app.kubernetes.io/part-of: {} f:control-plane: {} f:operator.kyma-project.io/managed-by: {} f:operator.kyma-project.io/watched-by: {} f:spec: f:replicas: {} f:selector: {} f:template: f:metadata: f:annotations: f:kubectl.kubernetes.io/default-container: {} f:traffic.sidecar.istio.io/excludeInboundPorts: {} f:labels: f:app.kubernetes.io/component: {} f:app.kubernetes.io/instance: {} f:app.kubernetes.io/name: {} f:control-plane: {} f:spec: f:containers: k:{"name":"manager"}: .: {} f:command: {} f:env: k:{"name":"APP_LOG_FORMAT"}: .: {} f:name: {} f:value: {} k:{"name":"APP_LOG_LEVEL"}: .: {} f:name: {} f:value: {} k:{"name":"DEFAULT_DISPATCHER_MAX_RETRIES"}: .: {} f:name: {} f:value: {} k:{"name":"DEFAULT_DISPATCHER_RETRY_PERIOD"}: .: {} f:name: {} f:value: {} k:{"name":"DEFAULT_MAX_IN_FLIGHT_MESSAGES"}: .: {} f:name: {} f:value: {} k:{"name":"EVENTING_CR_NAME"}: .: {} f:name: {} f:value: {} k:{"name":"EVENTING_CR_NAMESPACE"}: .: {} f:name: {} f:value: {} k:{"name":"EVENTING_WEBHOOK_AUTH_SECRET_NAME"}: .: {} f:name: {} f:value: {} k:{"name":"EVENTING_WEBHOOK_AUTH_SECRET_NAMESPACE"}: .: {} f:name: {} f:value: {} k:{"name":"JS_CONSUMER_DELIVER_POLICY"}: .: {} f:name: {} f:value: {} k:{"name":"JS_STREAM_DISCARD_POLICY"}: .: {} f:name: {} f:value: {} k:{"name":"JS_STREAM_MAX_BYTES"}: .: {} f:name: {} f:value: {} k:{"name":"JS_STREAM_MAX_MSGS"}: .: {} f:name: {} f:value: {} k:{"name":"JS_STREAM_NAME"}: .: {} f:name: {} f:value: {} k:{"name":"JS_STREAM_REPLICAS"}: .: {} f:name: {} f:value: {} k:{"name":"JS_STREAM_RETENTION_POLICY"}: .: {} f:name: {} f:value: {} k:{"name":"JS_STREAM_STORAGE_TYPE"}: .: {} f:name: {} f:value: {} k:{"name":"JS_STREAM_SUBJECT_PREFIX"}: .: {} f:name: {} f:value: {} k:{"name":"NAMESPACE"}: .: {} f:name: {} f:valueFrom: f:fieldRef: {} k:{"name":"NATS_URL"}: .: {} f:name: {} f:value: {} k:{"name":"PUBLISHER_IMAGE"}: .: {} f:name: {} f:value: {} k:{"name":"PUBLISHER_IMAGE_PULL_POLICY"}: .: {} f:name: {} f:value: {} k:{"name":"PUBLISHER_LIMITS_CPU"}: .: {} f:name: {} f:value: {} k:{"name":"PUBLISHER_LIMITS_MEMORY"}: .: {} f:name: {} f:value: {} k:{"name":"PUBLISHER_REPLICAS"}: .: {} f:name: {} f:value: {} k:{"name":"PUBLISHER_REQUEST_TIMEOUT"}: .: {} f:name: {} f:value: {} k:{"name":"PUBLISHER_REQUESTS_CPU"}: .: {} f:name: {} f:value: {} k:{"name":"PUBLISHER_REQUESTS_MEMORY"}: .: {} f:name: {} f:value: {} f:image: {} f:imagePullPolicy: {} f:livenessProbe: f:httpGet: f:path: {} f:port: {} f:initialDelaySeconds: {} f:periodSeconds: {} f:name: {} f:readinessProbe: f:httpGet: f:path: {} f:port: {} f:initialDelaySeconds: {} f:periodSeconds: {} f:resources: f:limits: f:cpu: {} f:memory: {} f:requests: f:cpu: {} f:memory: {} f:securityContext: f:allowPrivilegeEscalation: {} f:capabilities: f:drop: {} f:priorityClassName: {} f:securityContext: f:fsGroup: {} f:runAsGroup: {} f:runAsNonRoot: {} f:runAsUser: {} f:seccompProfile: f:type: {} f:serviceAccountName: {} f:terminationGracePeriodSeconds: {} manager: declarative.kyma-project.io/applier operation: Apply time: "2024-05-13T12:22:45Z" - apiVersion: apps/v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: .: {} f:kubectl.kubernetes.io/last-applied-configuration: {} f:labels: .: {} f:app.kubernetes.io/created-by: {} f:app.kubernetes.io/instance: {} f:app.kubernetes.io/managed-by: {} f:app.kubernetes.io/name: {} f:app.kubernetes.io/part-of: {} f:control-plane: {} f:spec: f:progressDeadlineSeconds: {} f:replicas: {} f:revisionHistoryLimit: {} f:selector: {} f:strategy: f:rollingUpdate: .: {} f:maxSurge: {} f:maxUnavailable: {} f:type: {} f:template: f:metadata: f:annotations: .: {} f:kubectl.kubernetes.io/default-container: {} f:traffic.sidecar.istio.io/excludeInboundPorts: {} f:labels: .: {} f:app.kubernetes.io/component: {} f:app.kubernetes.io/instance: {} f:app.kubernetes.io/name: {} f:control-plane: {} f:spec: f:containers: k:{"name":"manager"}: .: {} f:command: {} f:env: .: {} k:{"name":"APP_LOG_FORMAT"}: .: {} f:name: {} f:value: {} k:{"name":"APP_LOG_LEVEL"}: .: {} f:name: {} f:value: {} k:{"name":"DEFAULT_DISPATCHER_MAX_RETRIES"}: .: {} f:name: {} f:value: {} k:{"name":"DEFAULT_DISPATCHER_RETRY_PERIOD"}: .: {} f:name: {} f:value: {} k:{"name":"DEFAULT_MAX_IN_FLIGHT_MESSAGES"}: .: {} f:name: {} f:value: {} k:{"name":"EVENTING_CR_NAME"}: .: {} f:name: {} f:value: {} k:{"name":"EVENTING_CR_NAMESPACE"}: .: {} f:name: {} f:value: {} k:{"name":"EVENTING_WEBHOOK_AUTH_SECRET_NAME"}: .: {} f:name: {} f:value: {} k:{"name":"EVENTING_WEBHOOK_AUTH_SECRET_NAMESPACE"}: .: {} f:name: {} f:value: {} k:{"name":"JS_CONSUMER_DELIVER_POLICY"}: .: {} f:name: {} f:value: {} k:{"name":"JS_STREAM_DISCARD_POLICY"}: .: {} f:name: {} f:value: {} k:{"name":"JS_STREAM_MAX_BYTES"}: .: {} f:name: {} f:value: {} k:{"name":"JS_STREAM_MAX_MSGS"}: .: {} f:name: {} f:value: {} k:{"name":"JS_STREAM_NAME"}: .: {} f:name: {} f:value: {} k:{"name":"JS_STREAM_REPLICAS"}: .: {} f:name: {} f:value: {} k:{"name":"JS_STREAM_RETENTION_POLICY"}: .: {} f:name: {} f:value: {} k:{"name":"JS_STREAM_STORAGE_TYPE"}: .: {} f:name: {} f:value: {} k:{"name":"JS_STREAM_SUBJECT_PREFIX"}: .: {} f:name: {} f:value: {} k:{"name":"MUTATING_WEBHOOK_NAME"}: .: {} f:name: {} f:value: {} k:{"name":"NAMESPACE"}: .: {} f:name: {} f:valueFrom: {} k:{"name":"NATS_URL"}: .: {} f:name: {} f:value: {} k:{"name":"PUBLISHER_IMAGE"}: .: {} f:name: {} f:value: {} k:{"name":"PUBLISHER_IMAGE_PULL_POLICY"}: .: {} f:name: {} f:value: {} k:{"name":"PUBLISHER_LIMITS_CPU"}: .: {} f:name: {} f:value: {} k:{"name":"PUBLISHER_LIMITS_MEMORY"}: .: {} f:name: {} f:value: {} k:{"name":"PUBLISHER_REPLICAS"}: .: {} f:name: {} f:value: {} k:{"name":"PUBLISHER_REQUEST_TIMEOUT"}: .: {} f:name: {} f:value: {} k:{"name":"PUBLISHER_REQUESTS_CPU"}: .: {} f:name: {} f:value: {} k:{"name":"PUBLISHER_REQUESTS_MEMORY"}: .: {} f:name: {} f:value: {} k:{"name":"VALIDATING_WEBHOOK_NAME"}: .: {} f:name: {} f:value: {} k:{"name":"WEBHOOK_SECRET_NAME"}: .: {} f:name: {} f:value: {} f:imagePullPolicy: {} f:livenessProbe: .: {} f:failureThreshold: {} f:httpGet: .: {} f:path: {} f:port: {} f:scheme: {} f:initialDelaySeconds: {} f:periodSeconds: {} f:successThreshold: {} f:timeoutSeconds: {} f:name: {} f:readinessProbe: .: {} f:failureThreshold: {} f:httpGet: .: {} f:path: {} f:port: {} f:scheme: {} f:initialDelaySeconds: {} f:periodSeconds: {} f:successThreshold: {} f:timeoutSeconds: {} f:resources: .: {} f:limits: .: {} f:cpu: {} f:memory: {} f:requests: .: {} f:cpu: {} f:memory: {} f:securityContext: .: {} f:allowPrivilegeEscalation: {} f:capabilities: .: {} f:drop: {} f:terminationMessagePath: {} f:terminationMessagePolicy: {} f:volumeMounts: .: {} k:{"mountPath":"/tmp/k8s-webhook-server/serving-certs"}: .: {} f:mountPath: {} f:name: {} f:readOnly: {} f:dnsPolicy: {} f:priorityClassName: {} f:restartPolicy: {} f:schedulerName: {} f:securityContext: .: {} f:fsGroup: {} f:runAsGroup: {} f:runAsNonRoot: {} f:runAsUser: {} f:seccompProfile: .: {} f:type: {} f:serviceAccount: {} f:serviceAccountName: {} f:terminationGracePeriodSeconds: {} f:volumes: .: {} k:{"name":"cert"}: .: {} f:name: {} f:secret: .: {} f:defaultMode: {} f:secretName: {} manager: kubectl-client-side-apply operation: Update time: "2024-03-22T15:55:54Z" - apiVersion: apps/v1 fieldsType: FieldsV1 fieldsV1: f:spec: f:template: f:metadata: f:annotations: f:istio-operator.kyma-project.io/restartedAt: {} manager: manager operation: Update time: "2024-05-13T09:36:55Z" - apiVersion: apps/v1 fieldsType: FieldsV1 fieldsV1: f:spec: f:template: f:metadata: f:annotations: f:kubectl.kubernetes.io/restartedAt: {} manager: k9s operation: Update time: "2024-05-13T14:35:05Z" - apiVersion: apps/v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: f:deployment.kubernetes.io/revision: {} f:status: f:availableReplicas: {} f:conditions: .: {} k:{"type":"Available"}: .: {} f:lastTransitionTime: {} f:lastUpdateTime: {} f:message: {} f:reason: {} f:status: {} f:type: {} k:{"type":"Progressing"}: .: {} f:lastTransitionTime: {} f:lastUpdateTime: {} f:message: {} f:reason: {} f:status: {} f:type: {} f:observedGeneration: {} f:readyReplicas: {} f:replicas: {} f:unavailableReplicas: {} f:updatedReplicas: {} manager: kube-controller-manager operation: Update subresource: status time: "2024-05-13T14:45:06Z" ```

This managed field section results in a deployment that looks like this

Broken 1.2.0 deployment ```yaml apiVersion: apps/v1 kind: Deployment metadata: annotations: deployment.kubernetes.io/revision: "7" kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"labels":{"app.kubernetes.io/component":"eventing-manager","app.kubernetes.io/created-by":"eventing-manager","app.kubernetes.io/instance":"eventing-manager","app.kubernetes.io/managed-by":"kustomize","app.kubernetes.io/name":"eventing-manager","app.kubernetes.io/part-of":"Kyma","control-plane":"eventing-manager"},"name":"eventing-manager","namespace":"kyma-system"},"spec":{"replicas":1,"selector":{"matchLabels":{"app.kubernetes.io/component":"eventing-manager","app.kubernetes.io/instance":"eventing-manager","app.kubernetes.io/name":"eventing-manager","control-plane":"eventing-manager"}},"template":{"metadata":{"annotations":{"kubectl.kubernetes.io/default-container":"manager","traffic.sidecar.istio.io/excludeInboundPorts":"9443"},"labels":{"app.kubernetes.io/component":"eventing-manager","app.kubernetes.io/instance":"eventing-manager","app.kubernetes.io/name":"eventing-manager","control-plane":"eventing-manager"}},"spec":{"containers":[{"command":["/manager"],"env":[{"name":"NAMESPACE","valueFrom":{"fieldRef":{"fieldPath":"metadata.namespace"}}},{"name":"EVENTING_CR_NAME","value":"eventing"},{"name":"EVENTING_CR_NAMESPACE","value":"kyma-system"},{"name":"NATS_URL","value":"eventing-nats.kyma-system.svc.cluster.local"},{"name":"PUBLISHER_REQUESTS_CPU","value":"10m"},{"name":"PUBLISHER_REQUESTS_MEMORY","value":"64Mi"},{"name":"PUBLISHER_LIMITS_CPU","value":"100m"},{"name":"PUBLISHER_LIMITS_MEMORY","value":"128Mi"},{"name":"PUBLISHER_IMAGE","value":"europe-docker.pkg.dev/kyma-project/prod/eventing-publisher-proxy:1.0.1"},{"name":"PUBLISHER_IMAGE_PULL_POLICY","value":"IfNotPresent"},{"name":"PUBLISHER_REPLICAS","value":"1"},{"name":"PUBLISHER_REQUEST_TIMEOUT","value":"10s"},{"name":"DEFAULT_MAX_IN_FLIGHT_MESSAGES","value":"10"},{"name":"DEFAULT_DISPATCHER_RETRY_PERIOD","value":"5m"},{"name":"DEFAULT_DISPATCHER_MAX_RETRIES","value":"10"},{"name":"APP_LOG_FORMAT","value":"json"},{"name":"APP_LOG_LEVEL","value":"info"},{"name":"JS_STREAM_NAME","value":"sap"},{"name":"JS_STREAM_SUBJECT_PREFIX","value":"kyma"},{"name":"JS_STREAM_STORAGE_TYPE","value":"file"},{"name":"JS_STREAM_REPLICAS","value":"1"},{"name":"JS_STREAM_DISCARD_POLICY","value":"new"},{"name":"JS_STREAM_RETENTION_POLICY","value":"interest"},{"name":"JS_CONSUMER_DELIVER_POLICY","value":"new"},{"name":"JS_STREAM_MAX_MSGS","value":"-1"},{"name":"JS_STREAM_MAX_BYTES","value":"700Mi"},{"name":"WEBHOOK_SECRET_NAME","value":"eventing-manager-webhook-server-cert"},{"name":"MUTATING_WEBHOOK_NAME","value":"subscription-mutating-webhook-configuration"},{"name":"VALIDATING_WEBHOOK_NAME","value":"subscription-validating-webhook-configuration"},{"name":"EVENTING_WEBHOOK_AUTH_SECRET_NAME","value":"eventing-webhook-auth"},{"name":"EVENTING_WEBHOOK_AUTH_SECRET_NAMESPACE","value":"kyma-system"}],"image":"europe-docker.pkg.dev/kyma-project/prod/eventing-manager:1.1.0","imagePullPolicy":"Always","livenessProbe":{"httpGet":{"path":"/healthz","port":8081},"initialDelaySeconds":15,"periodSeconds":20},"name":"manager","readinessProbe":{"httpGet":{"path":"/readyz","port":8081},"initialDelaySeconds":5,"periodSeconds":10},"resources":{"limits":{"cpu":"500m","memory":"512Mi"},"requests":{"cpu":"10m","memory":"128Mi"}},"securityContext":{"allowPrivilegeEscalation":false,"capabilities":{"drop":["ALL"]}},"volumeMounts":[{"mountPath":"/tmp/k8s-webhook-server/serving-certs","name":"cert","readOnly":true}]}],"priorityClassName":"eventing-manager-priority-class","securityContext":{"fsGroup":10001,"runAsGroup":10001,"runAsNonRoot":true,"runAsUser":10001,"seccompProfile":{"type":"RuntimeDefault"}},"serviceAccountName":"eventing-manager","terminationGracePeriodSeconds":10,"volumes":[{"name":"cert","secret":{"defaultMode":420,"secretName":"eventing-manager-webhook-server-cert"}}]}}}} operator.kyma-project.io/managed-by-reconciler-disclaimer: |- DO NOT EDIT - This resource is managed by Kyma. Any modifications are discarded and the resource is reverted to the original state. creationTimestamp: "2024-03-22T15:55:54Z" generation: 8 labels: app.kubernetes.io/component: b04d102a-1244-4a43-b8e5-45315d68007c-eventing-3765891325 app.kubernetes.io/created-by: eventing-manager app.kubernetes.io/instance: eventing-manager app.kubernetes.io/managed-by: kustomize app.kubernetes.io/name: eventing-manager app.kubernetes.io/part-of: Kyma control-plane: eventing-manager operator.kyma-project.io/managed-by: declarative-v2 operator.kyma-project.io/watched-by: module-manager name: eventing-manager namespace: kyma-system resourceVersion: "67096177" uid: 02f65b00-30f7-4c64-9d1a-dd1281a66450 spec: progressDeadlineSeconds: 600 replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: app.kubernetes.io/component: eventing-manager app.kubernetes.io/instance: eventing-manager app.kubernetes.io/name: eventing-manager control-plane: eventing-manager strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate template: metadata: annotations: istio-operator.kyma-project.io/restartedAt: "2024-05-13T09:36:55Z" kubectl.kubernetes.io/default-container: manager kubectl.kubernetes.io/restartedAt: "2024-05-13T16:35:05+02:00" traffic.sidecar.istio.io/excludeInboundPorts: "9443" creationTimestamp: null labels: app.kubernetes.io/component: eventing-manager app.kubernetes.io/instance: eventing-manager app.kubernetes.io/name: eventing-manager control-plane: eventing-manager spec: containers: - command: - /manager env: - name: NAMESPACE valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.namespace - name: EVENTING_CR_NAME value: eventing - name: EVENTING_CR_NAMESPACE value: kyma-system - name: NATS_URL value: eventing-nats.kyma-system.svc.cluster.local - name: PUBLISHER_REQUESTS_CPU value: 10m - name: PUBLISHER_REQUESTS_MEMORY value: 64Mi - name: PUBLISHER_LIMITS_CPU value: 100m - name: PUBLISHER_LIMITS_MEMORY value: 128Mi - name: PUBLISHER_IMAGE value: europe-docker.pkg.dev/kyma-project/prod/eventing-publisher-proxy:1.0.1 - name: PUBLISHER_IMAGE_PULL_POLICY value: IfNotPresent - name: PUBLISHER_REPLICAS value: "1" - name: PUBLISHER_REQUEST_TIMEOUT value: 10s - name: DEFAULT_MAX_IN_FLIGHT_MESSAGES value: "10" - name: DEFAULT_DISPATCHER_RETRY_PERIOD value: 5m - name: DEFAULT_DISPATCHER_MAX_RETRIES value: "10" - name: APP_LOG_FORMAT value: json - name: APP_LOG_LEVEL value: info - name: JS_STREAM_NAME value: sap - name: JS_STREAM_SUBJECT_PREFIX value: kyma - name: JS_STREAM_STORAGE_TYPE value: file - name: JS_STREAM_REPLICAS value: "1" - name: JS_STREAM_DISCARD_POLICY value: new - name: JS_STREAM_RETENTION_POLICY value: interest - name: JS_CONSUMER_DELIVER_POLICY value: new - name: JS_STREAM_MAX_MSGS value: "-1" - name: JS_STREAM_MAX_BYTES value: 700Mi - name: WEBHOOK_SECRET_NAME value: eventing-manager-webhook-server-cert - name: MUTATING_WEBHOOK_NAME value: subscription-mutating-webhook-configuration - name: VALIDATING_WEBHOOK_NAME value: subscription-validating-webhook-configuration - name: EVENTING_WEBHOOK_AUTH_SECRET_NAME value: eventing-webhook-auth - name: EVENTING_WEBHOOK_AUTH_SECRET_NAMESPACE value: kyma-system image: europe-docker.pkg.dev/kyma-project/prod/eventing-manager:1.2.0 imagePullPolicy: Always livenessProbe: failureThreshold: 3 httpGet: path: /healthz port: 8081 scheme: HTTP initialDelaySeconds: 15 periodSeconds: 20 successThreshold: 1 timeoutSeconds: 1 name: manager readinessProbe: failureThreshold: 3 httpGet: path: /readyz port: 8081 scheme: HTTP initialDelaySeconds: 5 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 resources: limits: cpu: 500m memory: 512Mi requests: cpu: 10m memory: 128Mi securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /tmp/k8s-webhook-server/serving-certs name: cert readOnly: true dnsPolicy: ClusterFirst priorityClassName: eventing-manager-priority-class restartPolicy: Always schedulerName: default-scheduler securityContext: fsGroup: 10001 runAsGroup: 10001 runAsNonRoot: true runAsUser: 10001 seccompProfile: type: RuntimeDefault serviceAccount: eventing-manager serviceAccountName: eventing-manager terminationGracePeriodSeconds: 10 volumes: - name: cert secret: defaultMode: 420 secretName: eventing-manager-webhook-server-cert status: availableReplicas: 1 conditions: - lastTransitionTime: "2024-05-13T11:10:43Z" lastUpdateTime: "2024-05-13T11:10:43Z" message: Deployment has minimum availability. reason: MinimumReplicasAvailable status: "True" type: Available - lastTransitionTime: "2024-05-13T14:45:06Z" lastUpdateTime: "2024-05-13T14:45:06Z" message: ReplicaSet "eventing-manager-6b47bd4df7" has timed out progressing. reason: ProgressDeadlineExceeded status: "False" type: Progressing observedGeneration: 8 readyReplicas: 1 replicas: 2 unavailableReplicas: 1 updatedReplicas: 1 ```

The important difference here is (desired 1.2.0 -> actual) :

90a107,112
>         - name: WEBHOOK_SECRET_NAME
>           value: eventing-manager-webhook-server-cert
>         - name: MUTATING_WEBHOOK_NAME
>           value: subscription-mutating-webhook-configuration
>         - name: VALIDATING_WEBHOOK_NAME
>           value: subscription-validating-webhook-configuration
97a120
>           failureThreshold: 3
100a124
>             scheme: HTTP
102a127,128
>           successThreshold: 1
>           timeoutSeconds: 1
104a131
>           failureThreshold: 3
107a135
>             scheme: HTTP
109a138,139
>           successThreshold: 1
>           timeoutSeconds: 1
121a152,158
>         terminationMessagePath: /dev/termination-log
>         terminationMessagePolicy: File
>         volumeMounts:
>         - mountPath: /tmp/k8s-webhook-server/serving-certs
>           name: cert
>           readOnly: true
>       dnsPolicy: ClusterFirst
122a160,161
>       restartPolicy: Always
>       schedulerName: default-scheduler
129a169
>       serviceAccount: eventing-manager
131a172,176
>       volumes:
>       - name: cert
>         secret:
>           defaultMode: 420
>           secretName: eventing-manager-webhook-server-cert

the resulting deployment has the updated image (1.2.0) but it also keeps the webhook-cert mount, which should be removed from the deployment. As a result this deployment cannot start anymore, as the webhook cert secret was already removed from the cluster.

Steps to reproduce

See above

Environment Type

Managed

Environment Info

Current SKR

Attachments

No response

janmedrek commented 6 months ago

We would need to do some research in the local environment and check the details in a local environment.

c-pius commented 5 months ago

This is due to the patch semantic of server-side apply. We need to investigate how get the additional fields deleted via a server-side apply.

To reproduce locally:

Prep a config map

# my-cfg.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: my-cfg
data:
  foo: bar

kubectl apply -f my-cfg.yaml

Prep a deployment using --server-side

# my-dep-yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: my-dep
  name: my-dep
spec:
  replicas: 1
  selector:
    matchLabels:
      app: my-dep
  strategy: {}
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: my-dep
    spec:
      containers:
      - image: nginx
        name: nginx
        resources: {}

kubectl apply -f my-dep.yaml --server-side

Observe that kubectl is the manager of all fields: kubectl get deploy/my-dep -o yaml --show-managed-fields

Edit the deployment using kubectl edit

Note: the same didn't work out with server-side and client-side kubectl

Edit the deployment kubectl edit deploy my-dep and add the following to the spec/container:

        # container
        volumeMounts:
          - name: config
            mountPath: /foo/config
            readOnly: true
      # spec
      volumes:
        - name: config
          configMap:
            name: my-cfg

Observe that kubectl remains the manager of the initial fields, while kubectl-edit is the manager of the new volumeMount and volumes fields.

Re-apply the initial config

kubectl apply -f my-dep.yaml --server-side

Observe that the volumeMount and volumes fields still exist

Re-applying with volumes and then initial config

Adding volumeMount and volume to the config and re-applying kubectl apply -f my-dep.yaml --server-side, then removing both from the config and re-applying kubectl apply -f my-dep.yaml --server-side:

Doesn't remove volumeMount and volume... Is also documented accordingly (see):

If you remove a field from a manifest and apply that manifest, Server-Side Apply checks if there are any other field managers that also own the field. If the field is not owned by any other field managers, it is either deleted from the live object or reset to its default value, if it has one. The same rule applies to associative list or map items.

=> check how to remove the other ownership

c-pius commented 5 months ago

=> check how to remove the other ownership

See:

It is possible to strip all managedFields from an object by overwriting them using a patch (JSON Merge Patch, Strategic Merge Patch, JSON Patch), or through an update (HTTP PUT); in other words, through every write operation other than apply. This can be done by overwriting the managedFields field with an empty entry. Two examples are:

Clearing managed fields and re-applying:

kubectl patch deployment my-dep -p '{ "metadata": { "managedFields": [ {} ] } }' => cleared managed fields ✅

kubectl apply -f my-dep.yaml --server-side => removed volumeMount and volume

c-pius commented 5 months ago

Clearing managed fields and re-applying:

It seems like the same has been attempted in KLM already, see L120 (obj.SetManagedFields(nil)):

func (c *ConcurrentDefaultSSA) serverSideApplyResourceInfo(
    ctx context.Context,
    info *resource.Info,
) error {
    obj, isTyped := info.Object.(client.Object)
    if !isTyped {
        return fmt.Errorf(
            "%s is not a valid client-go object: %w", info.ObjectName(), ErrClientObjectConversionFailed,
        )
    }
    obj.SetManagedFields(nil)
    err := c.clnt.Patch(ctx, obj, client.Apply, client.ForceOwnership, c.owner)
    if err != nil {
        return fmt.Errorf(
            "patch for %s failed: %w", info.ObjectName(), c.suppressUnauthorized(err),
        )
    }

    return nil
}

However, I assume this is not working properly as above documentation clearly states that it must be an operation other than apply. Trying this with kubectl also yields: Error from server (BadRequest): metadata.managedFields must be nil

It seems like the same has been attempted in KLM already

EDIT: May be a wrong assumption. Maybe it was also just cleared as it cannot be set with a SSA.

EDIT EDIT: Tried with removing obj.SetManagedFields(nil) which doesn't make the request fail. So it may indeed be the case that this was set to try clearing the managed fields. If so however, it was implemented wrongly.

c-pius commented 4 months ago

Reproducing the issue with KLM

  1. Setup a test cluster
  2. Enable template operator
  3. Edit the resources part of deploy/template-operator-controller-manager, e.g.: to
    limits:
      cpu: 500m
      ephemeral-storage: 1Gi # added
      memory: 256Mi # changed from 128Mi

We can then observe that the KLM resets memory to 128Mi, as in the ModuleTemplate manifests ✅ . emphemeral-storage remains to be set ❌ .

Trying the fix of clearing managed fields

Given the situation above, tried to clear the managed fields by:

However, the ephemeral-storage field still persists ❌ .

Looking at the managed fields, we can observe that the fields have been cleared, however, a manager named manager: before-first-apply with operation: Update claimed all fields afterwards. This presumably prevents the removal of ephemeral-storage.

Couldn't find good documentation on before-first-apply, but GPT returned the following which seems reasonable:

The before-first-apply manager is a part of Kubernetes' Server-Side Apply (SSA) feature. It is automatically added by the Kubernetes API server when a resource is applied for the first time. The before-first-apply manager tracks the fields in the object that were set at the time of the first apply, allowing Kubernetes to distinguish between fields set by the user and fields set to their default values by the system.

In your case, when you patch the managedFields to clear them, the Kubernetes API server sees that as a new apply operation and sets the before-first-apply manager. This is why you see the before-first-apply manager claiming all fields of your deployment after running the patch command.

If you want to clear the managedFields without triggering the before-first-apply manager, you would need to use a different method that doesn't involve applying or patching the resource, such as replacing the resource entirely. However, this would also delete and recreate the resource, which might not be desirable in all cases.

c-pius commented 4 months ago

We haven't found a programmatic mitigation so far, and this ticket is time boxed. Therefore, we need propose the following:

The two scenarios we are aware of that may lead to this situation are:

b) is considered a user error. If they introduce a problem to a resource that we can't roll back automatically, the user needs to fix that.

a) is a problem we need to address. Since there is no automatic fix available at KLM side yet, we would need to rely on the module teams (which have started with old reconciler) to address this issue upfront before release a new version which may run into this issue. However, addressing this issue upfront also seems to be tricky. Testing with KLM it was for instance possible to force ownership by KLM by overwriting the field with a new value and "force ownership" flag set, afterwards applying the object again without the value. The same was not possible when using the same initial value as the original owner persists.

An approach that works without modifying the current value: Make KLM SSA the same object (same initial value of the field), remove the old manager by patching kubectl patch deployment template-operator-controller-manager -n template-operator-system --type='json' -p='[{"op": "remove", "path": "/metadata/managedFields/1"}]' (index 1 was the other manager). Remove the field from the manifest that KLM SSAs. This is also a non ideal solution as it requires manual fiddling with the resource in the SKR and operates index based (seems like JSON Patch doesn't support array operations based on field values).

c-pius commented 4 months ago

Summary: