kubernetes / system-validators

A set of system-oriented validators for kubeadm preflight checks.
Apache License 2.0
35 stars 27 forks source link

Support for cgroupv2 #12

Closed KentaTada closed 4 years ago

KentaTada commented 4 years ago

Preflight check will fail when clusters use cgroupv2 in future because /proc/cgroups is meaningless for v2.

k8s-ci-robot commented 4 years ago

Welcome @KentaTada!

It looks like this is your first PR to kubernetes/system-validators 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/system-validators has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. :smiley:

k8s-ci-robot commented 4 years ago

Hi @KentaTada. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
KentaTada commented 4 years ago

/assign @timothysc

neolit123 commented 4 years ago

@kubernetes/sig-node-pr-reviews /ok-to-test /assign

neolit123 commented 4 years ago

@kubernetes/sig-cluster-lifecycle-pr-reviews

KentaTada commented 4 years ago

Thank you. I'm sorry I mistakenly pushed my old commit. I re-pushed it. But I need to confirm remained comments. I'll handle it.

KentaTada commented 4 years ago

@neolit123 PTAL Thank you!

neolit123 commented 4 years ago

thanks for the update @KentaTada i will ping the owners of the kubelet (SIG-Node) for a review in their slack channel.

odinuge commented 4 years ago

The code looks good to me, but not sure if we should do this yet (altho this is quite generic). cgroupv2 is still (at least) a few releases away in k/k, and the support is tracked here: https://github.com/kubernetes/enhancements/pull/1370.

k/k do currently not support nodes with cgroupv2 enabled by default (mounted on that mount point as this PR looks for) as is now, so this will in practice make nodes pass validation, without being supported. kubelet still require all the cgroupv1 cgroups to be enabled.

So, ill add a hold. /hold

Another thought is that we maybe should start mocking the fs in the test (by setting the base directory) to make it simpler to do e2e-like testing of this, since

neolit123 commented 4 years ago

i'm personally fine with merging this even if the actual support is anticipated in the near future.

odinuge commented 4 years ago

i'm personally fine with merging this even if the actual support is anticipated in the near future.

Not sure when cgroupv2 will be fully supported by k/k tho. Definitely depending on the definition of near future, and it will certainly not be ready for 1.18.

The only issue is when someone running newest distros like the newest version of Fedora, and use kubeadm to bootstrap. All preflight checks will report ok, but kubelet will keep crashing due to missing cgroups. Other than that, it will make no hurt I guess.

So ill defer this to @derekwaynecarr.

neolit123 commented 4 years ago

The only issue is when someone running newest distros like the newest version of Fedora, and use kubeadm to bootstrap. All preflight checks will report ok, but kubelet will keep crashing due to missing cgroups. Other than that, it will make no hurt I guess.

k/k can remain importing the old version of the system-validators repository, but this will prevent merging other fixes in the meantime, so it seems better to hold because of that.

a problem is contributor bandwith too. if in N release @KentaTada is not available to update the PR, someone has to sign to take over.

KentaTada commented 4 years ago

Thank you for your review!

This is just FYI but I want to share our background to support cgroupv2 for Kubernetes. As Sony, we are building on-premise Kubernetes clusters for our rootless embedded container system and our system needs cgroupv2. So, I found this issue when I integrated Kubernetes with our system.

I think that it may be helpful for people like us as same as Fedora users if cgroupv2 support is prepared in mainline.

neolit123 commented 4 years ago

So, I found this issue when I integrated Kubernetes with our system.

are you using kubeadm which fails a cgroup preflight check?

KentaTada commented 4 years ago

So, I found this issue when I integrated Kubernetes with our system.

are you using kubeadm which fails a cgroup preflight check?

No. I'm sorry to confuse you but I found this issue when I checked source code before the dynamic test.

neolit123 commented 4 years ago

No. I'm sorry to confuse you but I found this issue when I checked source code before the dynamic test.

do you mean that you are not seeing any problems yet when these system-validators are used on your Nodes - i.e. you saw the verification is missing in the source code, but there is no real issue yet?

KentaTada commented 4 years ago

No. I'm sorry to confuse you but I found this issue when I checked source code before the dynamic test.

do you mean that you are not seeing any problems yet when these system-validators are used on your Nodes - i.e. you saw the verification is missing in the source code, but there is no real issue yet?

I have not experienced the real issue yet.

neolit123 commented 4 years ago

I have not experienced the real issue yet.

ok, thanks. let's see if other folks from SIG Node have an opinion here. it seems better to hold the PR until cgroupv2 support in k8s is ready.

giuseppe commented 4 years ago

let's see if other folks from SIG Node have an opinion here. it seems better to hold the PR until cgroupv2 support in k8s is ready.

I think these two could happen at the same time. I already have a PR for supporting cgroup v2 in the Kubelet, but I am still waiting for the KEP to be accepted.

dchen1107 commented 4 years ago

Thanks for sending the pr. cc @yguo0905

Can you hold that until cgroup v2 is properly supported by node (kubelet, container runtime, cadvisor, etc.)?

giuseppe commented 4 years ago

Thanks for sending the pr. cc @yguo0905

Can you hold that until cgroup v2 is properly supported by node (kubelet, container runtime, cadvisor, etc.)?

@dchen1107 the cgroup v2 PR for Kubelet got merged and Linux 5.6 was released (this is needed for the hugetlb controller). I'd expect everything to work on cgroup v2 at this point

neolit123 commented 4 years ago

i see that https://github.com/kubernetes/kubernetes/pull/85218 merged.

looking for LGTM by sig-node so that we can merge this PR, create a new release in this repository and vendor the new release in kubernetes/kubernetes.

KentaTada commented 4 years ago

@odinuge My understanding is that we need to modify below to test both cgroup v1 and v2.

In addition to that, should we split getCgroupSubsystems like getCgroupv1Subsystems and getCgroupv2Subsystems to make test easy?

If my understanding is correct, I'll prepare for it.

odinuge commented 4 years ago

@odinuge My understanding is that we need to modify below to test both cgroup v1 and v2.

Add the new DefaultSysSpec for cgroup v2 https://github.com/kubernetes/system-validators/blob/master/validators/types_unix.go Modify kubeadm to use cgroup v1 or v2. https://github.com/kubernetes/kubernetes/blob/v1.19.0-alpha.1/cmd/kubeadm/app/preflight/checks.go#L546 Modify e2e test to use cgroup v1 or v2. https://github.com/kubernetes/kubernetes/blob/v1.19.0-alpha.1/test/e2e_node/e2e_node_suite_test.go#L133

In addition to that, should we split getCgroupSubsystems like getCgroupv1Subsystems and getCgroupv2Subsystems to make test easy?

If my understanding is correct, I'll prepare for it.

Yes, that sounds like a plan! I guess @neolit123 may have some more input.

There will also be some various differences when it comes to the kernel config flags too. Here are some of my quick observations:

neolit123 commented 4 years ago

Add the new DefaultSysSpec for cgroup v2 https://github.com/kubernetes/system-validators/blob/master/validators/types_unix.go

this is not ideal. instead of adding a new DefaultSysSpec structure there should be a new CgroupV2Spec field here: https://github.com/kubernetes/system-validators/blob/f434adad8f2aeb4bee4a2f29fdfeefbac2d05bd9/validators/types_unix.go#L69

Modify kubeadm to use cgroup v1 or v2. Modify e2e test to use cgroup v1 or v2.

not needed.

KentaTada commented 4 years ago

@odinuge @neolit123 Thank you for your help :satisfied: I think we can prepare for kernel config flags each cgroup version if we add a new DefaultSysSpec. But it is difficult to maintain. As of now, I just only add a new CgroupV2Spec field and don't modify the kubeadm and the e2e test.

KentaTada commented 4 years ago

@neolit123 I modified cgroup_validator.go

odinuge commented 4 years ago

This looks good to me.

The only thing is that the kernel config CONFIG_CGROUP_BPF=y is required when using cgroupv2 (used for controlling device access). We cannot set it required for all, since a lot of older kernels have been compiled without it. Or should we just don't care about it?

odinuge commented 4 years ago

Other than CONFIG_CGROUP_BPF=y, this looks good to me now. I think we should get this into k/k as quick as possible, so we can take that flag later. Noone will compile their kernel without bpf for cgroups and use cgroup v2, so it shouldn't be a big problem yet.

/lgtm /hold cancel

ping @neolit123

neolit123 commented 4 years ago

Other than CONFIG_CGROUP_BPF=y, this looks good to me now. I think we should get this into k/k as quick as possible, so we can take that flag later. Noone will compile their kernel without bpf for cgroups and use cgroup v2, so it shouldn't be a big problem yet.

agreed, popular distro kernels should have CGROUP_BPF. if we face a single case without it we can add the extra validation for v2. although, hoping that the person doing triage on such a bug report will know what to ask for.

/approve

i will create a new release and send the PR for k/k to update the vendored validators.

k8s-ci-robot commented 4 years ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: KentaTada, neolit123, odinuge

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubernetes/system-validators/blob/master/OWNERS)~~ [neolit123] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment