kyma-project / lifecycle-manager

Controller that manages the lifecycle of Kyma Modules in your cluster.
http://kyma-project.io
Apache License 2.0
10 stars 30 forks source link

Investigate start up delay of KLM #1836

Open c-pius opened 2 months ago

c-pius commented 2 months ago

Description

When starting KLM, we can see a delay between the Pod running, and KLM actually picking up work (starting the reconciliation). We need to investigate the reason for this delay and if we can prevented.

To Reproduce (observable on local deployment as well as in KCPs):

Get a new Pod of KLM created. Observe that directly after start up, we can see log entries like the ones below (starting servers, trying to acquire leader lease, checking stored versions, ...). However, it then sits there for about 3 minutes not doing anything. After about 2-3 mins, we then see reconcilation logs and things happening on the clusters.

One hint to look at is the acquisition of the leader lease. We can see a log that it is attempting to acquire the lease. The confirmation that it got the lease only comes in about 2-3 mins later.

kubectl logs deploy/klm-controller-manager -n kcp-system
{"level":"INFO","date":"2024-09-06T07:01:01.476464588Z","logger":"setup","caller":"cmd/main.go:104","msg":"starting Lifecycle-Manager version: from_dockerfile","context":{}}
{"level":"INFO","date":"2024-09-06T07:01:01.675367088Z","logger":"controller-runtime.builder","caller":"builder/webhook.go:186","msg":"skip registering a mutating webhook, object does not implement admission.Defaulter or WithDefaulter wasn't called","context":{"GVK":"operator.kyma-project.io/v1beta2, Kind=ModuleTemplate"}}
{"level":"INFO","date":"2024-09-06T07:01:01.675943505Z","logger":"controller-runtime.builder","caller":"builder/webhook.go:202","msg":"Registering a validating webhook","context":{"GVK":"operator.kyma-project.io/v1beta2, Kind=ModuleTemplate","path":"/validate-operator-kyma-project-io-v1beta2-moduletemplate"}}
{"level":"INFO","date":"2024-09-06T07:01:01.677310963Z","logger":"controller-runtime.webhook","caller":"webhook/server.go:183","msg":"Registering webhook","context":{"path":"/validate-operator-kyma-project-io-v1beta2-moduletemplate"}}
{"level":"INFO","date":"2024-09-06T07:01:01.67917138Z","logger":"controller-runtime.metrics","caller":"server/server.go:208","msg":"Starting metrics server","context":{}}
{"level":"INFO","date":"2024-09-06T07:01:01.679687047Z","logger":"controller-runtime.metrics","caller":"server/server.go:247","msg":"Serving metrics server","context":{"bindAddress":":8080","secure":false}}
{"level":"INFO","date":"2024-09-06T07:01:01.68032863Z","caller":"manager/server.go:83","msg":"starting server","context":{"name":"health probe","addr":"[::]:8081"}}
{"level":"INFO","date":"2024-09-06T07:01:01.681180005Z","logger":"controller-runtime.webhook","caller":"webhook/server.go:191","msg":"Starting webhook server","context":{}}
{"level":"INFO","date":"2024-09-06T07:01:01.684769838Z","logger":"controller-runtime.certwatcher","caller":"certwatcher/certwatcher.go:161","msg":"Updated current TLS certificate","context":{}}
{"level":"INFO","date":"2024-09-06T07:01:01.685888172Z","logger":"controller-runtime.webhook","caller":"webhook/server.go:242","msg":"Serving webhook server","context":{"host":"","port":9443}}
I0906 07:01:01.687028       1 leaderelection.go:254] attempting to acquire leader lease kcp-system/893110f7.kyma-project.io...
{"level":"LEVEL(-2)","date":"2024-09-06T07:01:01.689037588Z","logger":"storage-version-migration","caller":"crd/storage_version_dropper.go:21","msg":"Handling dropping stored versions for, map[Kyma:v1beta1 Manifest:v1beta1 ModuleTemplate:v1beta1 Watcher:v1beta1]","context":{}}
{"level":"INFO","date":"2024-09-06T07:01:01.692686922Z","logger":"controller-runtime.certwatcher","caller":"certwatcher/certwatcher.go:115","msg":"Starting certificate watcher","context":{}}
{"level":"LEVEL(-2)","date":"2024-09-06T07:01:01.693148797Z","logger":"setup","caller":"cmd/main.go:255","msg":"scheduled job for cleaning up metrics","context":{}}
{"level":"DEBUG","date":"2024-09-06T07:01:02.169369714Z","logger":"storage-version-migration","caller":"crd/storage_version_dropper.go:33","msg":"Checking the storedVersions for Manifest crd","context":{}}
{"level":"DEBUG","date":"2024-09-06T07:01:02.169440755Z","logger":"storage-version-migration","caller":"crd/storage_version_dropper.go:42","msg":"The new storedVersions are [v1beta2]","context":{}}
{"level":"DEBUG","date":"2024-09-06T07:01:02.18142938Z","logger":"storage-version-migration","caller":"crd/storage_version_dropper.go:33","msg":"Checking the storedVersions for ModuleTemplate crd","context":{}}
{"level":"DEBUG","date":"2024-09-06T07:01:02.18145963Z","logger":"storage-version-migration","caller":"crd/storage_version_dropper.go:42","msg":"The new storedVersions are [v1beta2]","context":{}}
{"level":"DEBUG","date":"2024-09-06T07:01:02.186495547Z","logger":"storage-version-migration","caller":"crd/storage_version_dropper.go:33","msg":"Checking the storedVersions for Watcher crd","context":{}}
{"level":"DEBUG","date":"2024-09-06T07:01:02.186523839Z","logger":"storage-version-migration","caller":"crd/storage_version_dropper.go:42","msg":"The new storedVersions are [v1beta2]","context":{}}
{"level":"DEBUG","date":"2024-09-06T07:01:02.194104589Z","logger":"storage-version-migration","caller":"crd/storage_version_dropper.go:33","msg":"Checking the storedVersions for Kyma crd","context":{}}
{"level":"DEBUG","date":"2024-09-06T07:01:02.194134672Z","logger":"storage-version-migration","caller":"crd/storage_version_dropper.go:42","msg":"The new storedVersions are [v1beta2]","context":{}}

2-3 mins pause

I0906 07:04:05.144952       1 leaderelection.go:268] successfully acquired lease kcp-system/893110f7.kyma-project.io
{"level":"INFO","date":"2024-09-06T07:04:05.149399673Z","caller":"event/skr_events_listener.go:66","msg":"Listener is starting up...","context":{"Module":"Listener","Addr":":8082","ApiPath":"/v1/lifecycle-manager/event"}}
{"level":"INFO","date":"2024-09-06T07:04:05.149960173Z","caller":"event/skr_events_listener.go:66","msg":"Listener is starting up...","context":{"Module":"Listener","Addr":":8083","ApiPath":"/v1/module-manager/event"}}
{"level":"DEBUG","date":"2024-09-06T07:04:05.146138048Z","logger":"events","caller":"recorder/recorder.go:104","msg":"klm-controller-manager-6dd847c9cb-vwbvz_4e420c6d-2632-4927-8440-7b9a515f76e5 became leader","context":{"type":"Normal","object":{"kind":"Lease","namespace":"kcp-system","name":"893110f7.kyma-project.io","uid":"5de744b0-be1d-44d8-9084-d850f45d3379","apiVersion":"coordination.k8s.io/v1","resourceVersion":"17187"},"reason":"LeaderElection"}}
{"level":"INFO","date":"2024-09-06T07:04:05.155427173Z","caller":"controller/controller.go:175","msg":"Starting EventSource","context":{"controller":"watcher","controllerGroup":"operator.kyma-project.io","controllerKind":"Watcher","source":"kind source: *v1beta2.Watcher"}}
{"level":"INFO","date":"2024-09-06T07:04:05.155514173Z","caller":"controller/controller.go:175","msg":"Starting EventSource","context":{"controller":"mandatory-module-installation","controllerGroup":"operator.kyma-project.io","controllerKind":"Kyma","source":"kind source: *v1beta2.Kyma"}}
{"level":"INFO","date":"2024-09-06T07:04:05.155575965Z","caller":"controller/controller.go:183","msg":"Starting Controller","context":{"controller":"watcher","controllerGroup":"operator.kyma-project.io","controllerKind":"Watcher"}}
{"level":"INFO","date":"2024-09-06T07:04:05.155598715Z","caller":"controller/controller.go:175","msg":"Starting EventSource","context":{"controller":"mandatory-module-installation","controllerGroup":"operator.kyma-project.io","controllerKind":"Kyma","source":"kind source: *v1beta2.ModuleTemplate"}}
{"level":"INFO","date":"2024-09-06T07:04:05.155634132Z","caller":"controller/controller.go:175","msg":"Starting EventSource","context":{"controller":"mandatory-module-installation","controllerGroup":"operator.kyma-project.io","controllerKind":"Kyma","source":"kind source: *v1.Secret"}}
{"level":"INFO","date":"2024-09-06T07:04:05.155653507Z","caller":"controller/controller.go:183","msg":"Starting Controller","context":{"controller":"mandatory-module-installation","controllerGroup":"operator.kyma-project.io","controllerKind":"Kyma"}}
{"level":"INFO","date":"2024-09-06T07:04:05.155961798Z","caller":"controller/controller.go:175","msg":"Starting EventSource","context":{"controller":"manifest","controllerGroup":"operator.kyma-project.io","controllerKind":"Manifest","source":"kind source: *v1beta2.Manifest"}}
{"level":"INFO","date":"2024-09-06T07:04:05.155998882Z","caller":"controller/controller.go:175","msg":"Starting EventSource","context":{"controller":"manifest","controllerGroup":"operator.kyma-project.io","controllerKind":"Manifest","source":"kind source: *v1.Secret"}}
{"level":"INFO","date":"2024-09-06T07:04:05.157293882Z","caller":"controller/controller.go:175","msg":"Starting EventSource","context":{"controller":"manifest","controllerGroup":"operator.kyma-project.io","controllerKind":"Manifest","source":"channel source: 0xc001390620"}}
{"level":"INFO","date":"2024-09-06T07:04:05.157430257Z","caller":"controller/controller.go:175","msg":"Starting EventSource","context":{"controller":"kyma","controllerGroup":"operator.kyma-project.io","controllerKind":"Kyma","source":"kind source: *v1beta2.Kyma"}}
{"level":"INFO","date":"2024-09-06T07:04:05.157487882Z","caller":"controller/controller.go:175","msg":"Starting EventSource","context":{"controller":"kyma","controllerGroup":"operator.kyma-project.io","controllerKind":"Kyma","source":"kind source: *v1beta2.ModuleTemplate"}}
{"level":"INFO","date":"2024-09-06T07:04:05.157508507Z","caller":"controller/controller.go:175","msg":"Starting EventSource","context":{"controller":"kyma","controllerGroup":"operator.kyma-project.io","controllerKind":"Kyma","source":"kind source: *v1.Secret"}}
{"level":"INFO","date":"2024-09-06T07:04:05.157526673Z","caller":"controller/controller.go:175","msg":"Starting EventSource","context":{"controller":"kyma","controllerGroup":"operator.kyma-project.io","controllerKind":"Kyma","source":"kind source: *v1beta2.Manifest"}}
{"level":"INFO","date":"2024-09-06T07:04:05.157535007Z","caller":"controller/controller.go:175","msg":"Starting EventSource","context":{"controller":"kyma","controllerGroup":"operator.kyma-project.io","controllerKind":"Kyma","source":"channel source: 0xc001390540"}}
{"level":"INFO","date":"2024-09-06T07:04:05.159199965Z","caller":"controller/controller.go:175","msg":"Starting EventSource","context":{"controller":"mandatory-module-deletion","controllerGroup":"operator.kyma-project.io","controllerKind":"ModuleTemplate","source":"kind source: *v1beta2.ModuleTemplate"}}
...

Investigation task timboxed for: 2d

Reasons

Speed up local testing, avoid disruptions in production (when rolling out changes like with the certs).

Acceptance Criteria

Feature Testing

No response

Testing approach

No response

Attachments

No response

c-pius commented 2 months ago

Also noticed that this delay only seems to happen when there is an existing KLM pod which is replaced by a new one. In a fresh cluster with fresh KLM deployment, the delay does not seem to happen.