k0sproject / k0s

k0s - The Zero Friction Kubernetes
https://docs.k0sproject.io
Other
3.45k stars 353 forks source link

Broken `ClusterConfig` with dynamic config results in wiping system services #4721

Open jnummelin opened 2 months ago

jnummelin commented 2 months ago

Before creating an issue, make sure you've checked the following:

Platform

No response

Version

No response

Sysinfo

`k0s sysinfo`
➡️ Please replace this text with the output of `k0s sysinfo`. ⬅️

What happened?

In case we end up having broken config in dynamic ClusterConfig, e.g. both externalAddress and NLLB enabled, the additional controller can wipe out the system services such as kube-router.

In this case the second controller joining had empty stacks for all system services. At some point in time it had, supposedly, been the leader for a bit and thus applied some of the stacks. The stacks being empty, kube-router and some other stacks were completely removed.

Steps to reproduce

  1. Create a controller with --dynamic-config and some config
  2. Once the controller is up, create invalid ClusterConfig. Invalid from k0s point of view but valid (enough) to be accepted by the API
  3. Bootup second controller with --dynamic-config
  4. Depending on timing, and leader elections, second controller can wipe out system stacks

Expected behavior

When we receive invalid dynamic config we would need to stop reconciling it completely to not possibly borking already functioning cluster completely.

Actual behavior

No response

Screenshots and logs

No response

Additional context

No response

github-actions[bot] commented 3 weeks ago

The issue is marked as stale since no activity has been recorded in 30 days