kubernetes / kubeadm

Aggregator for issues filed against kubeadm
Apache License 2.0
3.74k stars 712 forks source link

NodeSwap feature supports in kubeadm #2563

Open pacoxu opened 3 years ago

pacoxu commented 3 years ago

Is this a BUG REPORT or FEATURE REQUEST?

FEATURE REQUEST

/kind feature

Versions

kubeadm version (use kubeadm version): NodeSwap is alpha in 1.22 and will be beta1 in 1.28(still default disabled).

What happened?

I tested NodeSwap on my nodes and when I re-install my env, I got error related to swap.

    [ERROR Swap]: running with swap on is not supported. Please disable swap

I think it's time to start planning for Swap enabling support on the kubeadm side.

What you expected to happen?

There should be NodeSwap support in kubeadm init and we can skip the check if the feature gate is enabled. Or in 1.23, we should skip the prelight check by default as it will be beta.

How to reproduce it (as minimally and precisely as possible)?

swapon and run kubeadm init

Anything else we need to know?

More details in https://github.com/kubernetes/enhancements/issues/2400

/assign

pacoxu commented 3 years ago

/cc @ehashman

neolit123 commented 3 years ago

Thanks for logging the issue.

I think it makes sense to remove the preflight check in the release when the feature goes beta. Checking the kubelet args / config for the FG is doable but a bit messy.

neolit123 commented 3 years ago

Actually, since we support kubelet n-1 skew, it should probably be done one release after beta.

pacoxu commented 3 years ago

Actually, since we support kubelet n-1 skew, it should probably be done one release after beta.

It makes sense. Hence, if it is beta in 1.23, kubeadm may add the support in 1.24+.

For users like me who want to try the alpha feature, does the preflight check of swap-off too harsh? The workaround is to add ignore flag in 1.22.

At least, the check should be removed in 1.23 when it’s beta in my opinions.

pacoxu commented 3 years ago

Or we may change the check error to a warning message?

neolit123 commented 3 years ago

Ok, in 1.23 we can switch it to warning. Remove it in 1.24.

neolit123 commented 2 years ago

looks like this is shifted to Beta for 1.24 due to some failures in CI and missing support in runtimes: https://docs.google.com/document/d/1Ne57gvidMEWXR70OxxnRkYquAoMpt56o75oZtg-OeBg/edit# (see notes for 26 Oct)

pacoxu commented 2 years ago

😓

However, changing SwapOn to be a warning, not an error is valid.

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

neolit123 commented 2 years ago

/remove-lifecycle stale

kep seems tracked for beta in 1.24: https://github.com/kubernetes/enhancements/issues/2400

neolit123 commented 2 years ago

update: looks like it was dropped from 1.24: https://github.com/kubernetes/enhancements/issues/2400#issuecomment-1068228077

pacoxu commented 2 years ago

update: looks like it was dropped from 1.24:

Most PRs are ready early in the v1.24 cycle. However, the e2e test can pass too late for v1.24. Some related PRs are still in review. Hope it can be beta in v1.25.

pacoxu commented 2 years ago

No update in v1.25 for swap feature as Elana is ooo.

Sergey will take the swap feature in later releases. No update in v1.26 until now.

Sergey added it to v1.27 Plan and I will work on the swap cgroup v2 support part.

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

chendave commented 1 year ago

/remove-lifecycle stale

pacoxu commented 1 year ago

https://github.com/kubernetes/kubernetes/pull/118764/ is promoting swap to beta in v1.28 which is in review now.

pacoxu commented 1 year ago

https://github.com/kubernetes/kubernetes/pull/118764 is merged. Swap is beta now. But the fail-on-swap=false or failSwapOn and swapBehavior should be set manually.

https://github.com/kubernetes/kubernetes/blob/b4d793c4502eb0248bfd58cab65f310182a8847d/hack/local-up-cluster.sh#L837-L841

pacoxu commented 1 year ago

Swap Feature is beta1 in v1.28.

failSwapOn and swapBehavior should be set if the FG is set to true in kubeadm init.

I may submit a PR in 1.29 release cycle.

neolit123 commented 1 year ago

https://github.com/kubernetes/kubeadm/issues/2563#issuecomment-915184761

we can drop our preflight check in 1.28. we support n-1 kubelet but hopefully the user manages this specific skew/setup if they use the older kubelet. i don't think we should add a kubeadm FG for this.

neolit123 commented 1 year ago

kubernetes/kubernetes#118764 is merged. Swap is beta now. But the fail-on-swap=false or failSwapOn and swapBehavior should be set manually.

https://github.com/kubernetes/kubernetes/blob/b4d793c4502eb0248bfd58cab65f310182a8847d/hack/local-up-cluster.sh#L837-L841

let's leave this manual. once the feature is GA we may need to update kubeadm docs to not mention these options, unless swap off is still recommended by default. the options will be no-op. IIUC

pacoxu commented 1 year ago

we can drop our preflight check in 1.28. we support n-1 kubelet but hopefully the user manages this specific skew/setup if they use the older kubelet.

Do you mean we should drop the warning in v1.29?

let's leave this manual. once the feature is GA we may need to update kubeadm docs to not mention these options, unless swap off is still recommended by default.

If so, todo items are dropping the warning and documenting it.

neolit123 commented 1 year ago

we can drop our preflight check in 1.28. we support n-1 kubelet but hopefully the user manages this specific skew/setup if they use the older kubelet.

Do you mean we should drop the warning in v1.29?

i may be missing context. to my understanding it's beta in 1.28. the FG is on by default but users must still manually apply the failSwapOn=false? we can drop the warning preflight check in 1.28 if we are sure the feature will become GA... or better we can wait until .29 or later until it graduates.

what is your recommendation?

let's leave this manual. once the feature is GA we may need to update kubeadm docs to not mention these options, unless swap off is still recommended by default.

If so, todo items are dropping the warning and documenting it.

we can remove the warning and clear our docs in terms of swap, but maybe keep the recommendation. for example, "swap is supported, but better keep it off".

pacoxu commented 1 year ago

It is still beta1 and we still have some tasks to make it Beta and then GA. So we may remove the warning 1.29 or even later.

@iholder101 will work on a blog about it.

sftim commented 1 year ago

we can remove the warning and clear our docs in terms of swap, but maybe keep the recommendation. for example, "swap is supported, but better keep it off".

Please do make that change; v1.28 has already been released

sftim commented 1 year ago

We could also clarify that kubeadm doesn't yet support swap even though it's supported for manually-provisioned Linux nodes as beta.

pacoxu commented 1 year ago

Let me check next week. I will update it.

pacoxu commented 1 year ago

We have mentioned the steps to enable Swap with kubeadm in the blog: https://kubernetes.io/blog/2023/08/24/swap-linux-beta/#set-up-a-kubernetes-cluster-that-uses-swap-enabled-nodes.

I opened https://github.com/kubernetes/kubernetes/pull/120198 to update the warning as this is still disabled by default, I prefer to keep the warning.

sftim commented 1 year ago

Any thoughts on changing https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/ ? Currently it says that you MUST disable swap.

pacoxu commented 1 year ago

Any thoughts on changing https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/ ? Currently it says that you MUST disable swap.

I opened https://github.com/kubernetes/website/pull/42820 to explain more about swap configurations.

k8s-triage-robot commented 7 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

neolit123 commented 7 months ago

/remove-lifecycle stale i see beta for 1.30: https://github.com/kubernetes/enhancements/issues/2400#issuecomment-1912804417

devZer0 commented 7 months ago

v1.28: swap is supported for cgroup v2 only

if swap is supported for cgroup v2, why does kubeadm init/join fail with obscure errors on debian 12 bookworm where cgroupv2 is active/enabled in containerd configuration ( SystemdCgroup = true ) ?

i got only warning because of swap and i never thought that something would miserably fail on init because swap being active (had disabled it in etc/fstab, but didn't swapoff -a)

did cost me some hours today.

https://github.com/kubernetes/kubeadm/issues/3017

neolit123 commented 7 months ago

did cost me some hours today.

the kubeadm setup docs mention swap and the new feature gate: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/

to your comment in the other ticket:

i'm 25yrs+ into linux/unix and i have NEVER seen something fail because swap being active, so this cost me a while to find out that this has blocked kubernetes installation/configuration

you can direct your annoyance to sig node, which is the group that maintains the kubelet component, where the feature has been missing since k8s epoch.

from the kubeadm docs:

the NodeSwap feature gate of the kubelet is beta but disabled by default.

that's not a good sign - basically an indication that the feature is not very stable yet (normally k8s beta features are on by default), thus turn swap off is our (kubeadm) recommendation.

devZer0 commented 7 months ago

thanks for the pointer. i'm fine when this is documented / mentioned somewhere in the docs and i'm also totally fine that swap needs to be disabled.

but it would be absolutely helpful especially for newbies, when pre-flight check would give a better hint.

the existing warning gives a false impression/advice, imho. you may assume that it's not that harmful in non-production/test envs. and as said in the other ticket, i cannot remember that i have seen something fail to setup/init because swap was enabled. i have only seen the opposite, i.e. some installer complained that swap needs to be enabled, regardless if needed at installation time or not.

i would NEVER have expected that an active swap would be installation/initialization blocker. i bet that 99 out of 100 linux admins also would not expect this, too.

" [WARNING Swap]: swap is enabled; production deployments should disable swap unless testing the NodeSwap feature gate of the kubelet"

neolit123 commented 7 months ago

FWIW, this ticket here is tracking the removal of the kubeadm preflight warning when NodeSwap becomes enabled by default. leaving the decision to @pacoxu whether the kubeadm warning should be updated for 1.30 and yes the wording can always be better, but if 1.30 enables the feature by default we are removing the preflight check entirely, from my understanding.

pacoxu commented 7 months ago

What we can do may be to return an error if the node is with cgroup v1 and swap on. Will this be more ambiguous?

devZer0 commented 7 months ago

yes, certainly.

but my system which cannot init or join when swap is active is on cgroup v2 (if is see this correctly) and i have also configured containerd appropriately

if i re-enable swap, drain the cluster node, reboot that and re-join, it reproducably hangs at the stage "[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap..."


root@kube3:~# cat /etc/containerd/config.toml|grep SystemdCgroup
            SystemdCgroup = true

root@kube3:~# stat -fc %T /sys/fs/cgroup/
cgroup2fs

# uname -a
Linux kube3 6.1.0-17-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.69-1 (2023-12-30) x86_64 GNU/Linux

# cat /etc/debian_version
12.4

# mount|grep cg
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)

## /usr/sbin/execsnoop-bpfcc -n runc
PCOMM            PID     PPID    RET ARGS
runc             2320    2301      0 /usr/sbin/runc --root /run/containerd/runc/k8s.io --log /run/containerd/io.containerd.runtime.v2.task/k8s.io/959e05c2381225e3672196925b13ddb58531b0f58f6ade9c727
fb96066e711a0/log.json --log-format json --systemd-cgroup create --bundle /run/containerd/io.containerd.runtime.v2.task/k8s.io/959e05c2381225e3672196925b13ddb58531b0f58f6ade9c7
27fb96066e711a0 --pid-file /run/containerd/io.containerd.runtime.v2.task/k8s.io/959e05c2381225e3672196925b13ddb58531b0f58f6ade9c727fb96066e711a0/init.pid 
959e05c2381225e3672196925b13ddb58531b0f58f6ade9c727fb96066e711a0
iholder101 commented 6 months ago

Hey all!

Some clarifications regarding the current status of NodeSwap in k8s:

IOW: in order to run k8s on a swap-enabled node there's a need to provide fail-on-swap=true. In order to actually give swap access to containers, the SwapBehavior needs to be set to LimitedSwap (which is currently the only swap behavior supported other than NoSwap).

Regarding cgroups: Only cgroup v2 is supported for swap. cgroup v1 can be used with NoSwap, which explicitly sets swap limit as 0 at the cgroup level, but cannot be used with LimitedSwap (see https://github.com/kubernetes/kubernetes/pull/123738).

IMO it's safe to remove the error and not even replace it with a warning since to actually use swap the admin would need to explicitly change swap behavior, even if fail-on-swap=true is provided to kubelet.

Please let me know if I can provide more information regarding this.

neolit123 commented 6 months ago

IOW: in order to run k8s on a swap-enabled node there's a need to provide fail-on-swap=true.

to avoid further complains from kubeadm users and additional logged tickets, i think we should keep the preflight check until the kubelet config is updated to not fail on swap by default.

is there a plan for that?

k8s-triage-robot commented 2 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

iholder101 commented 3 weeks ago

IOW: in order to run k8s on a swap-enabled node there's a need to provide fail-on-swap=true.

to avoid further complains from kubeadm users and additional logged tickets, i think we should keep the preflight check until the kubelet config is updated to not fail on swap by default.

is there a plan for that?

Hey @neolit123! As written here, the summary is:

So --fail-on-swap=false is still necessary (and that's not going to change until swap GAs), but the default behavior is NoSwap which means swap is inaccessible for k8s workloads by default.

Can we make sure that the installation docs here https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/ are changed so that swap doesn't have to be turned off with kubeadm?

neolit123 commented 3 weeks ago

@iholder101 would you be able to help us by pr- ing the docs?

iholder101 commented 3 weeks ago

@iholder101 would you be able to help us by pr- ing the docs?

Yeah I'd love to. I'll get to it shortly.

BTW, is there anything else required besides changing the docs?

iholder101 commented 3 weeks ago

On a second sight I see that @pacoxu already updated the docs here: https://github.com/kubernetes/website/pull/42820.

@neolit123 @pacoxu So, is there anything missing?

neolit123 commented 3 weeks ago

leaving this to @pacoxu to answer. the state of the noswap FG is still confusing to me, so hope we are clear in the docs and the preflight checks about it.

pacoxu commented 3 weeks ago

My update of the website is too general at that time.

Probably we should make it more clear of how to enable swap and use it in kubelet side.

In Beta2, the NodeSwap feature gate is on by default. However:

  • fail-on-swap=false still needs to be provided to kubelet.
  • The default "SwapBehavior" is NoSwap, which means containers do not have swap access.

This should be mentioned or we can link to the kubelet configuration details about swap to somewhere else which explained about the configurations of kubelet, including failOnSwap and SwapBehavior, and even the system reserve support.

neolit123 commented 2 weeks ago

@iholder101 @pacoxu should have https://github.com/kubernetes/website/pull/47710 closed this k/kubeadm issue or do we need to keep it for longer?

pacoxu commented 2 weeks ago

/reopen IIUC, we still need to remove the current preflight check warning in the future.

k8s-ci-robot commented 2 weeks ago

@pacoxu: Reopened this issue.

In response to [this](https://github.com/kubernetes/kubeadm/issues/2563#issuecomment-2328141068): >/reopen >IIUC, we still need to remove the current preflight check warning in the future. Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.