memorySwap: kubelet fails due to missing feature gate

hakoerber commented 2 years ago

Environment:

Cloud provider or hardware configuration: Hetzner VM
OS: CentOS 7
Version of Ansible: ansible 2.9.14
Version of Python: Python 3.8.10

Kubespray version: 52266406, latest master at time of opening this issue

Network plugin used: calico

I am using kubespray on a server with swap, so I have kubelet_fail_swap_on set to false. Since #8241, this also enables the alpha-stage memorySwap functionality of the kubelet (Link). Unfortunately, this fails (even on Kubernetes v1.23.1) due to a missing feature gate. The stdout of kubelet shows:

E0108 21:03:53.476082    5185 server.go:225] "Failed to validate kubelet configuration" err="invalid configuration: 
MemorySwap.SwapBehavior cannot be set when NodeSwap feature flag is disabled"

Note that the NodeSwap feature gate is disabled by default: https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/

When I enable the feature gate explicitly, all is well:

kube_feature_gates:
- NodeSwap=true

I'm honestly not sure how to best continue here. Maybe a documentation change would already be enough, emphasizing to enable the feature gate when using kubelet_fail_swap_on: false. Otherwise, I think of the following:

1) Enable the NodeSwap feature gate implicitly when setting kubelet_fail_swap_on: false. I am not sure how this can be sanely handled in ansible. 2) Decouple the memorySwap kubelet configuration from kubelet_fail_swap_on, so it is possible to use swap without using the memorySwap functionality. This might be desirable when one does not want to use alpha-level features, but still use swap on the node itself. The new variable could be something like enable_k8s_node_swap_usage.

What do you think?

cristicalin commented 2 years ago

Note that this is an alpha feature and it may suffer changes.

You have an example of how to set it up in the CI job: https://github.com/kubernetes-sigs/kubespray/blob/master/tests/files/packet_fedora35-calico-swap-selinux.yml

As discussed in #8241 this will remain as experimental support until the feature graduates to beta and we know how it will look like in a stable implementation so it can be documented.

hakoerber commented 2 years ago

You have an example of how to set it up in the CI job: https://github.com/kubernetes-sigs/kubespray/blob/master/tests/files/packet_fedora35-calico-swap-selinux.yml

Yes, this is the configuration that I'm currently using and that works for me. The issue is that I cannot opt out of this feature without setting kubelet_fail_swap_on: False, which means that I cannot use swap at all on the node. So there is now way to use swap but not use this alpha feature.

This can be acceptable, but I think it would be clearer to decouple this feature from the kubelet_fail_swap_on setting.

cristicalin commented 2 years ago

Can you detail the use-case of allowing the node to have swap but not allowing the kubelet to track the usage?

Unless you request swap for your pods this feature should not have an impact.

hakoerber commented 2 years ago

I personally don't have one :smile:

I was just surprised by the failure of the kubelet after the update, as kubelet_fail_swap_on now requires another setting (the feature gate) to work.

I guess this implicit coupling of settings should either:

Be broken up (by having different settings for each)
Be documented
Be automatically applied, by enabling the feature gate automatically when enabling swap

oomichi commented 2 years ago

I personally don't have one 😄

I was just surprised by the failure of the kubelet after the update, as kubelet_fail_swap_on now requires another setting (the feature gate) to work.

I guess this implicit coupling of settings should either:
* Be broken up (by having different settings for each)
* Be documented
* Be automatically applied, by enabling the feature gate automatically when enabling swap

I feel Be documented is necessary at least. Be automatically applied, by enabling the feature gate automatically when enabling swap also is reasonable from user viewpoint. But that makes dependency on alpha feature of Kubernetes which can be changed. That would cause the maintenance burden for taking care of which Kubernetes version is for the alpha feature configuration, which one is for the beta feature configuration on Kubespray side. I think that is the reason why @cristicalin didn't make the automated configuration.

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot commented 2 years ago

@k8s-triage-robot: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/kubespray/issues/8392#issuecomment-1153011467): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues and PRs according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue or PR with `/reopen` >- Mark this issue or PR as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

kubernetes-sigs / kubespray

memorySwap: kubelet fails due to missing feature gate #8392