kubernetes / kubeadm

Aggregator for issues filed against kubeadm
Apache License 2.0
3.73k stars 710 forks source link

run control-plane as non-root #2473

Open neolit123 opened 3 years ago

neolit123 commented 3 years ago

KEP https://github.com/kubernetes/enhancements/tree/master/keps/sig-cluster-lifecycle/kubeadm/2568-kubeadm-non-root-control-plane k/e issue: https://github.com/kubernetes/enhancements/issues/2568

This KEP proposes that the control-plane in kubeadm be run as non-root. If containers are running as root an escape from a container may result in the escalation to root in host. CVE-2019-5736 is an example of a container escape vulnerability that can be mitigated by running containers/pods as non-root.

kubeadm feature gate is called RootlessControlPlane

ALPHA 1.22:

BETA x.yy: on hold until further notice we are watching the user namespaces KEP: https://github.com/kubernetes/enhancements/pull/3065

neolit123 commented 3 years ago

/assign vinayakankugoyal

neolit123 commented 3 years ago

cc @vinayakankugoyal

vinayakankugoyal commented 3 years ago

/assign vinayakankugoyal

I don't think this can be assigned to me because I am not a kubernetes org member. But to anyone following this bug, I will be working on it.

vinayakankugoyal commented 3 years ago

Can we update the add feature gate: link in the description above to https://github.com/kubernetes/kubernetes/pull/102158

neolit123 commented 3 years ago

@vinayakankugoyal https://kubernetes.io/blog/2019/01/15/container-storage-interface-ga/#how-to-use-a-csi-volume if the rootless kubeadm apiserver eventually becomes ON by default, would it break CSI driver users?

vinayakankugoyal commented 3 years ago

@vinayakankugoyal https://kubernetes.io/blog/2019/01/15/container-storage-interface-ga/#how-to-use-a-csi-volume if the rootless kubeadm apiserver eventually becomes ON by default, would it break CSI driver users?

no because it is not the kube-apiserver that needs to run as privileged pod, it is the csi driver that needs to run as privileged pod. --allow-privileged=true allows privileged containers it does not make kube-apiserver's container privileged. (Same for kubelet but that is anyways out of scope of this KEP.)

vinayakankugoyal commented 3 years ago

/assign vinayakankugoyal

vinayakankugoyal commented 3 years ago

Can we update e2e section above with PR: https://github.com/kubernetes/kubeadm/pull/2511

pacoxu commented 2 years ago

I am doing some investigations and testing on this alpha feature recently. Is there anything that should be done in 1.23? @neolit123 @vinayakankugoyal

neolit123 commented 2 years ago

I am doing some investigations and testing on this alpha feature recently.

let us know if you find any bugs. my biggest concern is around supporting linux distros that are non-standard in terms of system files.

Is there anything that should be done in 1.23?

the KEP was not updated for 1.23, with the premise to give the alpha one more release for users to test it. i didn't see anyone object to this plan.

pacoxu commented 2 years ago

It works well in my basic testing and will keep running such a non-root env to see if there is an issue.

neolit123 commented 2 years ago

hey @vinayakankugoyal i saw you PR to turn on the FG by default: https://github.com/kubernetes/kubernetes/pull/106869

in the issue description here i've enumerated the steps for this to graduate to beta in 1.24. would you be able to work on these tasks in the next 4 months?

also note that starting next week i will be on PTO until early Jan 2022, so not sure how much i can review until then.

vinayakankugoyal commented 2 years ago

Hi @neolit123 (long time 😄 ). Thanks for updating the bug with the beta graduation work. Ill we able to work on these tasks in the next 4 months.

neolit123 commented 2 years ago

as noted earlier, there seem to be some activity on supporting user namespaces for pods in core k8s: https://github.com/kubernetes/enhancements/pull/3065 https://github.com/kubernetes/enhancements/pull/2101 (not sure which one is the KEP PR to watch, possibly the newer one).

The goal of supporting user namespaces in Kubernetes is to be able to run processes in pods with a different user and group IDs than in the host. Specifically, a privileged process in the pod runs as an unprivileged process in the host. If such a process is able to break out of the container to the host, it'll have limited impact as it'll be running as an unprivileged user there.

@vinayakankugoyal what is your evaluation of the user namespaces KEP? do you see it as something that has end-goal overlap with the kubeadm RootlessControlPlane FG? you've mentioned that it would not support hostPath mounts for the Alpha. anything else to note about it?

user namespace support is a much desired change in k8s, and i consider what we have in kubeadm a bit of a hack that may bite us due to distro specific drift - we manually manage the user/groups to simplify the UX and so that users that want to not run the CP as root can get it automatically. possibly not a big issue, since distros seem standard WRT the system files for users/groups.

i think we need to evaluate whether we want to put a hold on the kubeadm feature moving to beta and instead waiting on the username spaces feature to go Beta, at which point we can start using it and set the right fields in the Pod spec and potentially remove the kubeadm feature.

but....the user namespaces KEP is still in review and there are some pending concerns and a lot of discussion there. as we discussed with @fabriziopandini in today's kubeadm meeting, we would have to evaluate if that KEP is not going to move forward in time. if it moves forward nicely we might want to start using it at some point. in the meantime users can use the kubeadm alpha feature. if it does not move forward in time, we are going to graduate the kubeadm feature.

if we move the kubeadm feature to Beta and eventually plan to remove it in favor of user namespaces, this is doable but means we are opting-in everyone and we have to maintain the feature for the Beta deprecation (e.g. 1 year).

pacoxu commented 2 years ago

the user namespaces KEP is still in review and there are some pending concerns and a lot of discussions there. as we discussed with @fabriziopandini in today's kubeadm meeting

IMO, there are some overlaps between RootlessControlPlane in kubeadm and user namespace support in kubelet. Just two ways to make it. However, there are no conflicts.

As RootlessControlPlane is alpha, could we re-design it to use user namespace if possible? It means that kubeadm can fall back to the current solution if the UserNamespace is not enabled on the master node.

We can promote RootlessControlPlane to beta in 1.24 as this is a good-enough solution in my mind. If RootlessControlPlane keeps being alpha, it will benefit fewer users as it is not by default behavior.

neolit123 commented 2 years ago

As RootlessControlPlane is alpha, could we re-design it to use user namespace if possible?

We can talk more about that. I have not seen similar FG redesigns in k8s, but sounds doable if we redesign the alpha. If our FG is already beta a redesign contradicts with the beta definition, at least in my book.

We can promote RootlessControlPlane to beta in 1.24 as this is a good-enough solution in my mind. If RootlessControlPlane keeps being alpha, it will benefit fewer users as it is not by default behavior.

The main problem with promotion of FGs that we are not sure about to beta, is that users start enabling them in production even if the FG is off by default. Then they start binding their infra to the implementation in weird ways. E.g. reusing the kubeadm managed uid/gids. Beta also means - the underlying design is stable and ready to mature.

I think if we are not sure about something, it is wiser to just wait...maybe one more release. I like where the conversations are going in the user namespace kep and i think they will solve the volume problems as well at some point. If we can, i think we should help drive that kep with what we can - pr reviews etc.

pacoxu commented 2 years ago

Then they start binding their infra to the implementation in weird ways. E.g. reusing the kubeadm managed uid/gids. Beta also means - the underlying design is stable and ready to mature.

It seems that RootlessControlPlane is not a mature way. I walk through the KEP RootlessControlPlane and the risk https://github.com/kubernetes/enhancements/tree/master/keps/sig-cluster-lifecycle/kubeadm/2568-kubeadm-non-root-control-plane#risks-and-mitigations is still there.

If we hard coded the UID and GID, we could end up in a scenario where those are in use by another process on the machine, which would expose some of the credentials accessible to the UID and GIDs to that process. So we plan to use adduser --system or using the appropriate ranges from /etc/login.defs instead of hard coding the UID and GID.

If this is not a mature way, it should keep alpha and be removed once UserNamespace(a better solution? right?) is out.

All my suggestions in my last comments are based on that either is a good-enough solution for non-root on nodes. If not, the redesign is not acceptable.

pacoxu commented 2 years ago

https://github.com/kubernetes/enhancements/pull/3065 is merged. 👍

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

neolit123 commented 2 years ago

/lifecycle frozen

neolit123 commented 2 years ago

our e2e for this feature started failing yesterday. i have no explanation for the time being. but i don't think it's a kubeadm problem, so maybe something in core? https://github.com/kubernetes/kubeadm/issues/2750

pacoxu commented 1 year ago

our e2e for this feature started failing yesterday. i have no explanation for the time being. but i don't think it's a kubeadm problem, so maybe something in core? #2750

Yes

https://github.com/kubernetes/kubernetes/pull/113548(merged) may fix it. (a revert of https://github.com/kubernetes/kubernetes/pull/113408 that was merged hours before that. )

neolit123 commented 1 year ago

it looks like the job has been green for a while, so maybe something else fixed it. the failures were in late august. i completely forgot about this..

https://k8s-testgrid.appspot.com/sig-cluster-lifecycle-kubeadm#kubeadm-kinder-rootless-latest

pacoxu commented 1 year ago

I opened the test grid(You post months ago) and find it failed yesterday(😓).

kubernetes/kubernetes#113548 may fix it. (a revert of kubernetes/kubernetes#113408 that was merged hours before that. )

Yesterday's failure is caused by that. Not failures in August.😄

sftim commented 1 year ago

Is this actually important-longterm? It's been a few years.

pacoxu commented 1 year ago

/remove-priority important-soon

Is this actually important-longterm? It's been a few years.

This feature is an alternative way for the user namespace feature. As we prefer to use the user namespace to gain the security control plane in the future, we decided to not promote this one to beta. But we should keep this FG until user namespace https://github.com/kubernetes/enhancements/issues/127 is beta.

pacoxu commented 5 months ago

https://github.com/kubernetes/enhancements/issues/127 User Namespace is beta in v1.30. We may start the deprecation of RootlessControlPlane in v1.31.

LyKos4 commented 3 days ago

Is this expected to be completed? For which pods it is expected to change the user to non root?

neolit123 commented 3 days ago

Is this expected to be completed? For which pods it is expected to change the user to non root

this feature is alpha and deprecated. please use UserNamespaces instead: https://github.com/kubernetes/enhancements/issues/127

once UserNamespaces becomes GA kubeadm will enable it by default.