k0sproject / k0s

k0s - The Zero Friction Kubernetes
https://docs.k0sproject.io
Other
3.12k stars 353 forks source link

Helm extension charts are not installed if controller exits during first attempt #4637

Open emosbaugh opened 2 weeks ago

emosbaugh commented 2 weeks ago

Before creating an issue, make sure you've checked the following:

Platform

No response

Version

v1.29.5+k0s.0

Sysinfo

`k0s sysinfo`
Machine ID: "8e7e0fcbc82ee318fa1847250e5f843b994314ff9cca59cbb568b251e9fe2e4e" (from machine) (pass)
Total memory: 15.6 GiB (pass)
Disk space available for /var/lib/k0s: 21.6 GiB (pass)
Name resolution: localhost: [::1 127.0.0.1] (pass)
Operating system: Linux (pass)
  Linux kernel release: 6.5.0-1021-azure (pass)
  Max. file descriptors per process: current: 1048576 / max: 1048576 (pass)
  AppArmor: active (pass)
  Executable in PATH: modprobe: exec: "modprobe": executable file not found in $PATH (warning)
  Executable in PATH: mount: /usr/bin/mount (pass)
  Executable in PATH: umount: /usr/bin/umount (pass)
  /proc file system: mounted (0x9fa0) (pass)
  Control Groups: version 2 (pass)
    cgroup controller "cpu": available (is a listed root controller) (pass)
    cgroup controller "cpuacct": available (via cpu in version 2) (pass)
    cgroup controller "cpuset": available (is a listed root controller) (pass)
    cgroup controller "memory": available (is a listed root controller) (pass)
    cgroup controller "devices": available (device filters attachable) (pass)
    cgroup controller "freezer": available (cgroup.freeze exists) (pass)
    cgroup controller "pids": available (is a listed root controller) (pass)
    cgroup controller "hugetlb": available (is a listed root controller) (pass)
    cgroup controller "blkio": available (via io in version 2) (pass)
  CONFIG_CGROUPS: Control Group support: no kernel config found (warning)
  CONFIG_NAMESPACES: Namespaces support: no kernel config found (warning)
  CONFIG_NET: Networking support: no kernel config found (warning)
  CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: no kernel config found (warning)
  CONFIG_PROC_FS: /proc file system support: no kernel config found (warning)

What happened?

The k0s-controller crashed during initial installation, and when it was restarted by the systemd controller it did not install the helm charts.

Unfortunately since this error occurred in CI I only have what was collected by our automation.

Steps to reproduce

1. 2. 3.

Expected behavior

Upon restart of the controller the helm charts install will resume.

Actual behavior

Helm chart extensions were never installed.

Screenshots and logs

k0scontroller-logs.txt

charts.yaml.txt

Below are https://troubleshoot.sh/ bundles:

TestMultiNodeAirgapUpgradeSameK0s-support-bundle-host.tar.gz.zip

TestMultiNodeAirgapUpgradeSameK0s-support-bundle-cluster.tar.gz.zip

Additional context

Relevant stack trace:

Jun 14 14:52:11 node-16d20-00 k0s[944]: I0614 14:52:11.627739     944 leaderelection.go:250] attempting to acquire leader lease k0s-autopilot/k0s-autopilot-controller...
Jun 14 14:52:11 node-16d20-00 k0s[944]: time="2024-06-14 14:52:11" level=info msg="Got lease event = pending, reconfiguring controllers" component=autopilot
Jun 14 14:52:11 node-16d20-00 k0s[944]: time="2024-06-14 14:52:11" level=info msg="Stopping subcontrollers" component=autopilot leasemode=pending
Jun 14 14:52:11 node-16d20-00 k0s[944]: time="2024-06-14 14:52:11" level=info msg="Starting subcontrollers" component=autopilot leadermode=false
Jun 14 14:52:11 node-16d20-00 k0s[944]: time="2024-06-14 14:52:11" level=info msg="Starting controller-runtime subhandlers" component=autopilot leadermode=false
Jun 14 14:52:11 node-16d20-00 k0s[944]: fatal error: concurrent map read and map write
Jun 14 14:52:11 node-16d20-00 k0s[944]: goroutine 3684 [running]:
Jun 14 14:52:11 node-16d20-00 k0s[944]: k8s.io/apimachinery/pkg/runtime.(*Scheme).ObjectKinds(0xc0003b41c0, {0x3ffef98?, 0xc001a29860})
Jun 14 14:52:11 node-16d20-00 k0s[944]:         k8s.io/apimachinery@v0.29.5/pkg/runtime/scheme.go:263 +0xbd
...
Jun 14 14:52:11 node-16d20-00 k0s[944]: goroutine 2722 [select]:
Jun 14 14:52:11 node-16d20-00 k0s[944]: k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext({0x4027e78, 0xc0003c4540}, {0x40116e0?, 0xc001a6c560}, 0x1, 0x0, 0x0?)
Jun 14 14:52:11 node-16d20-00 k0s[944]:         k8s.io/apimachinery@v0.29.5/pkg/util/wait/loop.go:66 +0x1e6
Jun 14 14:52:11 node-16d20-00 k0s[944]: k8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel({0x4027e78, 0xc0003c4540}, 0xc001a6c520?, 0x2?, 0x2?)
Jun 14 14:52:11 node-16d20-00 k0s[944]:         k8s.io/apimachinery@v0.29.5/pkg/util/wait/poll.go:33 +0x56
Jun 14 14:52:11 node-16d20-00 k0s[944]: helm.sh/helm/v3/pkg/kube.(*waiter).waitForResources(0xc002155e90, {0xc003142300, 0x8, 0x8})
Jun 14 14:52:11 node-16d20-00 k0s[944]:         helm.sh/helm/v3@v3.14.2/pkg/kube/wait.go:53 +0x195
Jun 14 14:52:11 node-16d20-00 k0s[944]: helm.sh/helm/v3/pkg/kube.(*Client).WaitWithJobs(0xc0020d6ea0, {0xc003142300, 0x8, 0x8}, 0x8bb2c97000)
Jun 14 14:52:11 node-16d20-00 k0s[944]:         helm.sh/helm/v3@v3.14.2/pkg/kube/client.go:311 +0x1fa
Jun 14 14:52:11 node-16d20-00 k0s[944]: helm.sh/helm/v3/pkg/action.(*Install).performInstall(0xc0020e8000, 0xc00061acb0, {0x0?, 0x0?, 0x0?}, {0xc003142300, 0x8, 0x8})
Jun 14 14:52:11 node-16d20-00 k0s[944]:         helm.sh/helm/v3@v3.14.2/pkg/action/install.go:450 +0x207
Jun 14 14:52:11 node-16d20-00 k0s[944]: helm.sh/helm/v3/pkg/action.(*Install).performInstallCtx.func1()
Jun 14 14:52:11 node-16d20-00 k0s[944]:         helm.sh/helm/v3@v3.14.2/pkg/action/install.go:407 +0x3c
Jun 14 14:52:11 node-16d20-00 k0s[944]: created by helm.sh/helm/v3/pkg/action.(*Install).performInstallCtx in goroutine 1856
Jun 14 14:52:11 node-16d20-00 k0s[944]:         helm.sh/helm/v3@v3.14.2/pkg/action/install.go:406 +0x14c
Jun 14 14:52:11 node-16d20-00 k0s[944]: goroutine 1856 [select]:
Jun 14 14:52:11 node-16d20-00 k0s[944]: helm.sh/helm/v3/pkg/action.(*Install).performInstallCtx(0xc0020e8000, {0x4027dd0, 0xc0020d6000}, 0xc00061acb0, {0x0, 0x0, 0x0}, {0xc003142300, 0x8, 0x8})
Jun 14 14:52:11 node-16d20-00 k0s[944]:         helm.sh/helm/v3@v3.14.2/pkg/action/install.go:410 +0x1b4
Jun 14 14:52:11 node-16d20-00 k0s[944]: helm.sh/helm/v3/pkg/action.(*Install).RunWithContext(0xc0020e8000, {0x4027dd0, 0xc0020d6000}, 0x32?, 0x5?)
Jun 14 14:52:11 node-16d20-00 k0s[944]:         helm.sh/helm/v3@v3.14.2/pkg/action/install.go:392 +0x12ee
Jun 14 14:52:11 node-16d20-00 k0s[944]: github.com/k0sproject/k0s/pkg/helm.(*Commands).InstallChart(0xc0005c2600?, {0x4027dd0, 0xc0020d6000}, {0xc001ecbf40, 0x32}, {0xc001e41f0a, 0x5}, {0xc001e41f00, 0x7}, {0xc001e41ee9, ...}, ...)
Jun 14 14:52:11 node-16d20-00 k0s[944]:         github.com/k0sproject/k0s/pkg/helm/helm.go:253 +0x436
Jun 14 14:52:11 node-16d20-00 k0s[944]: github.com/k0sproject/k0s/pkg/component/controller.(*ChartReconciler).updateOrInstallChart(_, {_, _}, {{{0x2ef6edb, 0x5}, {0xc000dcaca0, 0x1a}}, {{0xc002425368, 0x17}, {0x0, ...}, ...}, ...})
Jun 14 14:52:11 node-16d20-00 k0s[944]:         github.com/k0sproject/k0s/pkg/component/controller/extensions_controller.go:305 +0x3f7
Jun 14 14:52:11 node-16d20-00 k0s[944]: github.com/k0sproject/k0s/pkg/component/controller.(*ChartReconciler).Reconcile(0xc001746150, {0x4027dd0, 0xc0020d6000}, {{{0xc001e41ef0, 0xb}, {0xc002425368, 0x17}}})
Jun 14 14:52:11 node-16d20-00 k0s[944]:         github.com/k0sproject/k0s/pkg/component/controller/extensions_controller.go:241 +0x370
Jun 14 14:52:11 node-16d20-00 k0s[944]: sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x4030348?, {0x4027dd0?, 0xc0020d6000?}, {{{0xc001e41ef0?, 0xb?}, {0xc002425368?, 0x0?}}})
Jun 14 14:52:11 node-16d20-00 k0s[944]:         sigs.k8s.io/controller-runtime@v0.17.0/pkg/internal/controller/controller.go:119 +0xb7
Jun 14 14:52:11 node-16d20-00 k0s[944]: sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc001bed7c0, {0x4027e08, 0xc00091db30}, {0x34a5320?, 0xc0016e3a60?})
Jun 14 14:52:11 node-16d20-00 k0s[944]:         sigs.k8s.io/controller-runtime@v0.17.0/pkg/internal/controller/controller.go:316 +0x3cc
Jun 14 14:52:11 node-16d20-00 k0s[944]: sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc001bed7c0, {0x4027e08, 0xc00091db30})
Jun 14 14:52:11 node-16d20-00 k0s[944]:         sigs.k8s.io/controller-runtime@v0.17.0/pkg/internal/controller/controller.go:266 +0x1af
Jun 14 14:52:11 node-16d20-00 k0s[944]: sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
Jun 14 14:52:11 node-16d20-00 k0s[944]:         sigs.k8s.io/controller-runtime@v0.17.0/pkg/internal/controller/controller.go:227 +0x79
Jun 14 14:52:11 node-16d20-00 k0s[944]: created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 1693
Jun 14 14:52:11 node-16d20-00 k0s[944]:         sigs.k8s.io/controller-runtime@v0.17.0/pkg/internal/controller/controller.go:223 +0x565
juanluisvaladas commented 5 days ago

Hi, just a minor update, I successfully reproduced this and I'm currently working on it.

emosbaugh commented 5 days ago

Thanks!!!

juanluisvaladas commented 4 days ago

Note to self:

The issue is that the existing charts never reach the function InstallOrUpgradeChart because the controller doesn't get an event for existing objects.

https://github.com/k0sproject/k0s/blob/main/pkg/component/controller/extensions_controller.go#L282

I'm pretty sure we must patch the ExtensionsController.Start() function https://github.com/k0sproject/k0s/blob/main/pkg/component/controller/extensions_controller.go#L411

Not quite sure how though.