kubernetes-sigs / kubespray

Deploy a Production Ready Kubernetes Cluster
Apache License 2.0
16.2k stars 6.49k forks source link

Stricter kubeadm validation (config and runtime checks) #11710

Closed VannTen closed 1 week ago

VannTen commented 1 week ago

What type of PR is this? /kind cleanup

What this PR does / why we need it: This makes kubespray stricter regarding kubeadm config and errors in 2 ways:

  1. validate the kubeadm config files on templating
  2. don't ignore preflight errors by default. Instead we expose a variable to let users ignore certain errors if needed.

We need this in order to catch bad formatting of the config files (which are otherwise non-fatal), which could led to settings not applied at all.

Also some cleanups, and fix some stuff introducted in v1beta4 support #11674 (not caught because of the lack of that validation, precisely)

Special notes for your reviewer: Apparently, we ignore kubeadm errors since the introduction of kubeadm support in #1631 (commit 6da20e2

Does this PR introduce a user-facing change?:

action required
`kubeadm_ignore_preflight_errors` is introduced to ignore specific preflight checks from kubeadm. The previous was effectively `all`, so some errors might surface during upgrade, in which cases, users should add the ones they choose to ignore to that variable.
VannTen commented 1 week ago

/ok-to-test

VannTen commented 1 week ago

Well, the validation works:

TASK [kubernetes/kubeadm : Create kubeadm client config] ***********************
task path: /builds/kargo-ci/kubernetes-sigs-kubespray/roles/kubernetes/kubeadm/tasks/main.yml:74
fatal: [test-vm-gv46p]: FAILED! => {"changed": false, "checksum": "562a882839726dd23f5de3de823925aa2f7ca942", "exit_status": 1, "msg": "failed to validate", "stderr": "error unmarshaling configuration schema.GroupVersionKind{Group:\"kubeadm.k8s.io\", Version:\"v1beta4\", Kind:\"JoinConfiguration\"}: strict decoding error: unknown field \"discovery.timeout\"\nTo see the stack trace of this error execute with --v=5 or higher\n", "stderr_lines": ["error unmarshaling configuration schema.GroupVersionKind{Group:\"kubeadm.k8s.io\", Version:\"v1beta4\", Kind:\"JoinConfiguration\"}: strict decoding error: unknown field \"discovery.timeout\"", "To see the stack trace of this error execute with --v=5 or higher"], "stdout": "", "stdout_lines": []}
VannTen commented 1 week ago

I'm not sure why the vagrant job is still failing, the v1beta4 should be fixed there as well :thinking:

I had missed one.

{
  "changed": false,
  "checksum": "387f5d5b7358fd92b61070db17c06edd4e56827e",
  "exit_status": 1,
  "msg": "failed to validate",
  "stderr": "error unmarshaling configuration schema.GroupVersionKind{Group:\"kubeadm.k8s.io\", Version:\"v1beta4\", Kind:\"JoinConfiguration\"}: strict decoding error: unknown field \"discovery.timeout\"\nTo see the stack trace of this error execute with --v=5 or higher\n",
  "stderr_lines": [
    "error unmarshaling configuration schema.GroupVersionKind{Group:\"kubeadm.k8s.io\", Version:\"v1beta4\", Kind:\"JoinConfiguration\"}: strict decoding error: unknown field \"discovery.timeout\"",
    "To see the stack trace of this error execute with --v=5 or higher"
  ],
  "stdout": "",
  "stdout_lines": []
}
VannTen commented 1 week ago

/cc @tico88612

k8s-ci-robot commented 1 week ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MrFreezeex, tico88612, VannTen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubernetes-sigs/kubespray/blob/master/OWNERS)~~ [VannTen] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment