siderolabs / talos

Talos Linux is a modern Linux distribution built for Kubernetes.
https://www.talos.dev
Mozilla Public License 2.0
6.39k stars 514 forks source link

Support for pod user namespaces #8554

Open Piccirello opened 5 months ago

Piccirello commented 5 months ago

This issue is to track Talos's support for user namespaces^0 in Kubernetes pods. User namespaces allow for strict separation between the root user in pods and the root user on the host. From the docs: "A process running as root in a container can run as a different (non-root) user in the host; in other words, the process has full privileges for operations inside the user namespace, but is unprivileged for operations outside the namespace."

User namespaces requires at least Linux 6.3, which it appears Talos v1.7.0 will support. The Kubernetes docs also state that "containerd v1.7 is not compatible with the userns support in Kubernetes v1.27 to v1.29." That may mean waiting for containerd 2.0^1, though this is unclear to me.

When user namespaces are eventually supported, it would be worth mentioning as a feature in the Talos release's changelog.

sanmai-NL commented 4 months ago

How will this affect Talos Linux nodes running inside containers? And potentially, in user-namespaced/rootless containers?

Piccirello commented 4 months ago

The kubernetes docs page^1 linked above has been updated with more information. It now seems more explicitly clear that containerd v2 is needed.

containerd v1.7 is not compatible with the userns support in Kubernetes v1.27 to v1.30. Kubernetes v1.25 and v1.26 used an earlier implementation that is compatible with containerd v1.7, in terms of userns support.

Piccirello commented 2 months ago

Based on #8766, #8777, and #8484, it appears Talos 1.8 will use containerd 2.0. That may mean that pod user namespaces will be supported in Talos 1.8.

smira commented 2 months ago

Yes, it should be ready for that, I believe there's nothing to be done on Talos OS side itself to support that. If you have a good testcase for user namespaces (e.g. something you can kubectl apply), we'd be happy to get it into the integration tests. Thanks!

Piccirello commented 2 months ago

It appears that there are two feature gates that need to be enabled: UserNamespacesSupport and UserNamespacesPodSecurityStandards. Both of these currently default to false, with the latter's effect described here.

I love the idea of including a testcase. The KEP states that if the runtime doesn't support user namespaces, a deployment with hostUsers: false will fail to be created. A sample Pod definition is provided for testing. In my testing against Kubernetes v1.30.1 on Talos 1.7.3, that Pod definition is deployed without issue, despite the Pod running in the host user namespace (confirmed with cat /proc/self/uid_map). I don't know why this works, though I suspect it's because the feature gate is disabled. If the feature gate were enabled, I would expect the deployment to fail due to the use of containerd v1.7. I'm not sure that this presents a clear testcase though, as it sounds like there would only be an error produced when the feature gate is enabled AND containerd <1.8 is used. Ideally the test case would fail whenever the host user namespace is used (i.e. including when the feature gate is disabled).

frezbo commented 1 month ago

I think there some bug with kubelet, it never fails and the mappings inside the pod are completely wrong, I would have expected kubelet to fail to create or throw an error and that is not the expected behavior, tested by adding the feature gate

frezbo commented 1 month ago

More updates on this, this feature seems to give a false sense of security, if the feature gate is not enabled a pod with hostUsers: false set would be happily scheduled and running on a node that does not meet any requirement for user namespaces, this seems weird and seems like a security issue giving a false sense of security.

With the feature gate enabled and hostUsers: false set the pod fails to be scheduled with this error:

failed to mount rootfs component: no space left on device

which in indeed a red herring and might be some other issue masked by this error

frezbo commented 1 month ago

Created an issue in k8s https://github.com/kubernetes/kubernetes/issues/126484