adjust the kubeadm / kubelet skew policy

NorthFuture commented 1 year ago

edit by neolit123

action items:

[x] adjust all kubelet skew constants / checks for init/join/upgrade. https://github.com/kubernetes/kubernetes/pull/120825
[x] update the "create a cluster with kubeadm" page where our skew is documented. it must be for the release-1.29 k/website branch. https://github.com/kubernetes/website/pull/43769 ~- [ ] add the -f flag for upgrade node. this is a nice to have, we missed 1.29, so it can be added in 1.30. TODO~ ~- [ ] add one new e2e test, that upgrades with the new kubelet skew without -f. (this is actually difficult because -f is required in our CI; it's used for kubeadm to allow upgrading to a pre-release / CI artifact) TODO~ ~- [ ] in 1.32 remove --ignore-preflight-errors=KubeletVersion from the kinder kubelet skew jobs see this note for details TODO~

UPDATE not needed https://github.com/kubernetes/kubeadm/pull/2944#pullrequestreview-1726883980

how was 1.32 established (UPDATE note needed)

kubeadm 1.29 is the first release that supports the new skew
kubeadm 1.29 supports deploying kubelet 1.29, 1.28, 1.27, 1.26
k8s (kubeadm) support window is 3x releases at a time
target kubeadm version to drop the preflight = when 1.28 goes out of support / 1.32 is released.
note: the kubeadm/CP skew is ignored in this case even if the kubelet/CP skew is the actual target of this change

FEATURE REQUEST:

with kubernetes 1.28 the skew policy for control plane components has been updated, and now you can have control plane components that's three versions ahead than kubelets.

However kubeadm has still a n-1 skew policy. Such policy prevents to skip some kubelet upgrates of worker nodes, that could save a lot of time during upgrades in large clusters.

edit(neolit123): KEP LINK: Support Oldest Node And Newest Control Plane https://github.com/kubernetes/enhancements/tree/master/keps/sig-architecture/3935-oldest-node-newest-control-plane

Versions

kubeadm version (use kubeadm version): 1.28

Environment:

Kubernetes version (use kubectl version): 1.28
Cloud provider or hardware configuration: N/A
OS (e.g. from /etc/os-release): N/A
Kernel (e.g. uname -a): N/A
Container runtime (CRI) (e.g. containerd, cri-o): N/A
Container networking plugin (CNI) (e.g. Calico, Cilium): N/A
Others:

What happened?

during kubeadm upgrade apply the upgrade processes si stopped with the following error

There are kubelets in this cluster that are too old that have these versions [v1.x.yy]

What you expected to happen?

To be able to upgrade the control plane to 1.28 even if the cluster has 1.26 or 1.25 kubletes.

neolit123 commented 1 year ago

@NorthFuture thanks for logging the issue.

Such policy prevents to skip some kubelet upgrates of worker nodes, that could save a lot of time during upgrades in large clusters.

upgrade apply has -f where it forces the upgrade. but upgrade node for workers lacks the -f: https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-upgrade/

one quick workaround is to add the flag.

but another workaround is to just skip the kubelet-config phase with --skip-phases download it manually from kube-system/kubelet-config, and restart the kubelet (systemd) with the desired version. https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-upgrade/#cmd-upgrade-node

However kubeadm has still a n-1 skew policy.

as noted on slack: https://kubernetes.slack.com/archives/C09NXKJKA/p1693814176564259

it's actually two polices: kubeadm vs kubelet is n-1 kubeadm vs control plane is n-1

you can read more about the current state of the kubeadm skew here: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#version-skew-policy

so technically if the user deploys kubeadm version == control-plane version and we extend the kubelet skew to n-2 we will align with the new policy.

neolit123 commented 1 year ago

@SataQiu @pacoxu @chendave

but upgrade node for workers lacks the -f ... one quick workaround is to add the flag.

should we add this flag and would it help?

so technically if the user deploys kubeadm version == control-plane version and we extend the kubelet skew to n-2 we will align with the new policy.

this however means we need to support it. if the kubelet starts making some drastic changes we will need to spend time maintaining this n-2 skew

one good aspect is that the kubelet has not been chaning much and we already have e2e tests for this unsupported by kubeadm skew: https://testgrid.k8s.io/sig-cluster-lifecycle-kubeadm

see the kubeadm-kinder-kubelet-x-on-y tests all the way to n-3 (e.g. k8s at 1.28, kubelet at 1.25), which tests were requested by SIG node at some point.

WDYT?

pacoxu commented 1 year ago

this seems to be valid for annual node upgrade. I need to check the kep by Jordon tomorrow to confirm about the details（that is valid kubelet skew）

NorthFuture commented 1 year ago

upgrade apply has -f where it forces the upgrade.

Just a consideration on the usage of this flag: while it's true that the upgrade can be forced, usually I'm not inclined to force some action when there's a blocking error (if someone put in place a blocking error, there might a reason :smile: )

if kubeadm skew policy is extended, no -f flag is required correct?

neolit123 commented 1 year ago

upgrade apply has -f where it forces the upgrade.

Just a consideration on the usage of this flag: while it's true that the upgrade can be forced, usually I'm not inclined to force some action when there's a blocking error (if someone put in place a blocking error, there might a reason 😄 )

if kubeadm skew policy is extended, no -f flag is required correct?

agreed that not blocking with a skew error is the preferred action. the flag is just missing for "upgrade node", which is more of a side issue.

SataQiu commented 1 year ago

~~We can add -f flag for upgrade node as a workaround in the short term, and make it consistent with upgrade apply.~~

I found that kubeadm upgrade node can work well without the -f flag because it doesn't check the kubelet version skew. The preflight will only execute RunRootCheckOnly and RunPullImagesCheck(when it is a control-plane node) . https://github.com/kubernetes/kubernetes/blob/7e9fbc449ddccab5be18d7d5fa0a4158ec8227f2/cmd/kubeadm/app/cmd/phases/upgrade/node/preflight.go#L45-L75

But ideally, I think we'd better make kubeadm align with the skew strategy of Kubernetes. How about adjusting kubeadm upgrade apply to allow lower kubelet versions?

pacoxu commented 1 year ago

https://github.com/kubernetes/enhancements/tree/master/keps/sig-architecture/3935-oldest-node-newest-control-plane

There is an upgrade proposal to make annual node upgrade possible:

Begin: control plane and nodes on v1.40
Control plane upgrade: v1.40 → v1.41 → v1.42 → v1.43
Node upgrades: v1.40 → v1.43

liggitt commented 1 year ago

but upgrade node for workers lacks the -f ... one quick workaround is to add the flag.

should we add this flag and would it help? ... but another workaround is to just skip the kubelet-config phase with --skip-phases

In the short-term, having --force, --skip-phases, and --ignore-preflight-errors supported across kubeadm commands would make kubeadm more consistent and would be really helpful, especially so consumers that just want to tolerate kubelet skew (until kubeadm expands to match core components supported skew) could narrowly skip that with --ignore-preflight-errors=KubeletVersion instead of skipping all checks with --force.

this however means we need to support it. if the kubelet starts making some drastic changes we will need to spend time maintaining this n-2 skew

Making sure node folks are aware of tools (like kubeadm) wanting to manage nodes with consistent flags/config across supported versions would help inform / temper drastic changes they are considering. Even if new features add optional flags / config, being careful about new required flags / config would make this skewed support easier to maintain.

one good aspect is that the kubelet has not been changing much and we already have e2e tests for this unsupported by kubeadm skew: https://testgrid.k8s.io/sig-cluster-lifecycle-kubeadm

+100 on already having visibility to how well this works for some kubeadm commands (and it's been working well). Having visibility to whether it actually works to use kubeadm to upgrade a control-plane while nodes are at n-1 or n-2 (using --force or --ignore-preflight-errors, etc) would be a great next step, would help node folks see impact of kubelet changes early, and would help cluster-lifecycle judge the stability of this operation before committing to support it officially.

But ideally, I think we'd better make kubeadm align with the skew strategy of Kubernetes. How about adjusting kubeadm upgrade apply to allow lower kubelet versions?

This would be my ideal as well. For the folks that know kubeadm well:

is the primary question whether kubelet command-line or config will require a change (either dropping support for some flag/field or requiring some new flag/field) that will make kubeadm start to have to do multiple version-specific kubelet configurations?
Is version-specific node configuration possible today in kubeadm? Is it relatively clean to maintain?
Have you talked with node folks about plans for rolling out kubelet flag / config changes in a way that helps cluster admins configuring multiple node versions?

neolit123 commented 1 year ago

but another workaround is to just skip the kubelet-config phase with --skip-phases

In the short-term, having --force, --skip-phases, and --ignore-preflight-errors supported across kubeadm commands would make kubeadm more consistent and would be really helpful, especially so consumers that just want to tolerate kubelet skew (until kubeadm expands to match core components supported skew) could narrowly skip that with --ignore-preflight-errors=KubeletVersion instead of skipping all checks with --force.

+1

this however means we need to support it. if the kubelet starts making some drastic changes we will need to spend time maintaining this n-2 skew

Making sure node folks are aware of tools (like kubeadm) wanting to manage nodes with consistent flags/config across supported versions would help inform / temper drastic changes they are considering. Even if new features add optional flags / config, being careful about new required flags / config would make this skewed support easier to maintain.

our tests can hopefully catch such changes.

one good aspect is that the kubelet has not been changing much and we already have e2e tests for this unsupported by kubeadm skew: https://testgrid.k8s.io/sig-cluster-lifecycle-kubeadm

+100 on already having visibility to how well this works for some kubeadm commands (and it's been working well). Having visibility to whether it actually works to use kubeadm to upgrade a control-plane while nodes are at n-1 or n-2 (using --force or --ignore-preflight-errors, etc) would be a great next step, would help node folks see impact of kubelet changes early, and would help cluster-lifecycle judge the stability of this operation before committing to support it officially.

yes, we are going to have to add an explicit e2e test for the n-2 kubelet upgrade.

is the primary question whether kubelet command-line or config will require a change (either dropping support for some flag/field or requiring some new flag/field) that will make kubeadm start to have to do multiple version-specific kubelet configurations?

i was thinking that likely flags/config changes can break us.

for example the --bootstrap-kubeconfig and --kubeconfig flag of the kubelet has been deprecated (?) and planned for removal, that was discussed in an issue somewhere.

the kubelet v1beta1 does not have the fields for these options yet.

as an example, and if this is still in the plans, the kubelet maintainers must carefully execute the removal of the flag and addition of the options in config. this will have a wider effect than kubeadm. for kubeadm we can certainly adapt somehow, but the change may not be so easy since the flags are hardcoded in systemd files, distributed in packages.

Is version-specific node configuration possible today in kubeadm? Is it relatively clean to maintain?

in the past we have done different kubelet flags/fields for different kubelet versions, which is just branching in the kubeadm code that manages the kubelet and it is kept for 1,2 releases with a TODO for a later cleanup.

there is no persistent node specific component configuration support per se on the API server side, but there is:

custom kubelet flags per this node stored in a systemd environment file
kubelet v1beta1 patches that persist on disk

Have you talked with node folks about plans for rolling out kubelet flag / config changes in a way that helps cluster admins configuring multiple node versions?

AFAIK, no. the latest work in this area was https://github.com/kubernetes/enhancements/issues/3983

IIRC, the rule in the kubelet is to always add a field in the config, the corresponding CLI flag may or may not be added (?).

not having the micro-versions or history in the kubeletconfiguration makes it a bit difficult for the admin to determine what version of the API has the new option Foo. they could still use a single file that has Foo even for kubelet versions that do not support the option, since the kubelet would not warning/error for unknown fields when parsing the API.

kubeadm will throw a warning for the unknown field depending on the kubelet public type it imported.

pacoxu commented 1 year ago

is the primary question whether kubelet command-line or config will require a change (either dropping support for some flag/field or requiring some new flag/field) that will make kubeadm start to have to do multiple version-specific kubelet configurations?

i was thinking that likely flags/config changes can break us.

for example the --bootstrap-kubeconfig and --kubeconfig flag of the kubelet has been deprecated (?) and planned for removal, that was discussed in an issue somewhere.

the kubelet v1beta1 does not have the fields for these options yet.

as an example, and if this is still in the plans, the kubelet maintainers must carefully execute the removal of the flag and addition of the options in config. this will have a wider effect than kubeadm. for kubeadm we can certainly adapt somehow, but the change may not be so easy since the flags are hardcoded in systemd files, distributed in packages.

Can we keep the kubelet configuration as is if the kubelet version is n-1 or n-2 which is not the same as the kubeadm version? Then kubelet will have no risk of being corrupted.

When we upgrade the kubelet and run the kubeadm upgrade node again, we can update the kubelet configuration at that time.

We only keep the control-plane version compatible kubelet configurations in the configmap kubelet-config.

neolit123 commented 1 year ago

Can we keep the kubelet configuration as is if the kubelet version is n-1 or n-2 which is not the same as the kubeadm version? Then kubelet will have no risk of being corrupted.

When we upgrade the kubelet and run the kubeadm upgrade node again, we can update the kubelet configuration at that time.

We only keep the control-plane version compatible kubelet configurations in the configmap kubelet-config.

we are going to continue managing a single kubeletconfiguration config map for now. but eventually if kubeletconfiguration v1 is released and v1beta1 deprecated and removed, we need to plan how to upgrade users and whether we need to store both v1 and v1beta1 in a config map, temporarily.

liggitt commented 1 year ago

Given how long v1beta1 kubelet config has been around, supporting it in parallel with any eventual v1 config file for several releases (maybe 4 so n-3 would work) would seem reasonable to me.

If kubeadm folks have an idea of how parallel v1beta1 and v1 config blobs could be provided as well, that's another possibility.

Sounds like there's no known issues that make this impossible, more uncertainty about kubelet config evolution and compatibility across multiple versions. I think sig-node would be amenable to making any config transitions as easy as possible for cluster admins (like kubeadm and others).

pacoxu commented 1 year ago

Summary of what may need an update later(if we make the skew policy of kubelet can be n-3 of kube-apiserver):

upgrade apply with some node with n-2/n-3 kubelet: detect too old kubelets https://github.com/kubernetes/kubernetes/blob/4eb6b3907a68514e1b2679b31d95d61f4559c181/cmd/kubeadm/app/phases/upgrade/policy.go#L170-L189

// newK8sVersion.Minor() > kubeletVersion.Minor()+MaximumAllowedMinorVersionKubeletSkew MaximumAllowedMinorVersionKubeletSkew = 1 to 3.

// kubeadm upgrade apply v1.28.1
[upgrade/version] FATAL: the --version argument is invalid due to these errors:

    - There are kubelets in this cluster that are too old that have these versions [v1.26.0]

Can be bypassed if you pass the --force flag

workaround: kubeadm upgrade apply -f v1.28.1

join node kubelet n-2 : https://github.com/kubernetes/kubernetes/blob/4eb6b3907a68514e1b2679b31d95d61f4559c181/cmd/kubeadm/app/preflight/checks.go#L610-L616

MinimumKubeletVersion = getSkewedKubernetesVersion(-1) to -3.

    [ERROR KubeletVersion]: Kubelet version "1.26.0" is lower than kubeadm can support. Please upgrade kubelet

workaround: --ignore-preflight-errors=KubeletVersion

pacoxu commented 1 year ago

@neolit123 should we make it in v1.29? Are there any other action items before this?

neolit123 commented 1 year ago

@neolit123 should we make it in v1.29? Are there any other action items before this?

we can try for 1.29.

actions:

add the -f flag for upgrade node.
adjust all kubelet skew constants / checks for init/join/upgrade.
update the "create a cluster with kubeadm" page where our skew is documented. it must be for the release-1.29 k/website branch.
add one new e2e test, that upgrades with the new kubelet skew without -f.

lalitc375 commented 11 months ago

I can see that the skew policy has been updated. @pacoxu Are you also adding the e2e test ?

pacoxu commented 11 months ago

I can see that the skew policy has been updated. @pacoxu Are you also adding the e2e test ?

Already some e2e using ignorePreflightErrors:KubeletVersion to workaround. https://github.com/kubernetes/kubeadm/pull/2944 to update the CI.

neolit123 commented 10 months ago

I can see that the skew policy has been updated. @pacoxu Are you also adding the e2e test ?

Already some e2e using ignorePreflightErrors:KubeletVersion to workaround. #2944 to update the CI.

updated the OP here, with some notes on what we decided to do with our existing kubelet skew jobs in terms of this preflight error: https://github.com/kubernetes/kubeadm/issues/2924#issue-1880296845

k8s-triage-robot commented 7 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

neolit123 commented 7 months ago

@neolit123 should we make it in v1.29? Are there any other action items before this?

we can try for 1.29.

actions:

add the -f flag for upgrade node.

we realized this is not needed, as ignoring the preflight error (optionally) is enough.

adjust all kubelet skew constants / checks for init/join/upgrade.

done

update the "create a cluster with kubeadm" page where our skew is documented. it must be for the release-1.29 k/website branch.

done

add one new e2e test, that upgrades with the new kubelet skew without -f.

we understood we cannot remove the -f for upgrade apply due to how -f is designed and used to also for pre-release versions.

i consider the tasks here done on a best effort. on a best effort we also maintain the kubelet skew with the kubeadm code and e2e.

kubernetes / kubeadm