Closed wibed closed 11 months ago
right after opening the issue i checked the logs again. one log before i noticed the following line at its beginning:
time="2023-12-06T09:22:26Z" level=debug msg="admission request: &AdmissionRequest{UID:148e0d5b-1e5b-4d50-bbe7-0082e8f021ec,Kind:/v1, Kind=Pod,Resource:{ v1 pods},SubResource:,Name:,Namespace:ingress-nginx,Operation:CREATE,UserInfo:{system:serviceaccount:kube-system:replicaset-controller b022d175-2d67-4ab5-8393-7c9eebd041ed [system:serviceaccounts system:serviceaccounts:kube-system system:authenticated] map[]}
and the replicaset in question:
Warning FailedCreate 6m19s replicaset-controller Error creating: pods "cluster0-ingress-nginx-controller-7c767cc878-rf4hg" is forbidden: violates PodSecurity "baseline:latest": non-default capabilities (container "linkerd-init" must not include "NET_ADMIN", "NET_RAW" in securityContext.capabilities.add)
a related issue, i ended up adding privileges to the linkerd
namespace.
yet i cant do this on the whole mesh though.
https://github.com/linkerd/linkerd2/issues/11667 https://github.com/linkerd/linkerd2/pull/6258 https://github.com/linkerd/linkerd2/issues/11319
how does this work out?
(container "linkerd-init" must not include "NET_ADMIN", "NET_RAW" in securityContext.capabilities.add)
vs
// Skip NET_RAW and NET_ADMIN as the init container requires them to setup iptables.
if drop == "NET_RAW" || drop == "NET_ADMIN" { continue }
@wibed hey, thanks for filing this. I'm a little bit confused about all of the output that you've posted. What is the concrete problem that you're facing?
My understanding is that you have an admission webhook that prevents pods from starting if they're configured with NET_ADMIN
or NET_RAW
capabilities? And you'd like to know how to get everything to work without those permissions and without explicitly allowing it on certain namespaces?
The init container requires those and there is no way to drop the capabilities and still have linkerd work correctly. Redirection is a centre piece of running proxied traffic.
i was under the impression i could and did whitelist capabilities on top of the pss of the ones provided by k8s. but this has proven to be false.
i am currently setting up the whole gatekeeper opa with custom policy bundles to make this happen. will report back once i have succeeded.
furthermore was i under the impression there would be some sort of possibility for restricted pods on top of the linkerd-cni, but the demanded permission stay the same.
linkerd-cni
does not need to be meshed, and is an exception to the rule since it sets up the CNI plugin. It does not need to do any network IO and it should be installed before Linkerd.
As for running restricted pods, NET_*
caps are non negotiable unfortunately for the init container, they're required for iptables to load the required kernel modules and manipulate the firewall (which is a privileged operation) so there's no easy way out of it :( . With that being said, the CNI plugin runs on the host. Whatever invokes the plugin binary will have the required permissions to do everything I said above.
my bad. the cni running on the host is new information for me. ive noticed i have options to configure cni provider on talos, yet dont know how to do so.
yet the source of the problem is not the cni in this issue, i just tried it out to test if the permission requirement would change or not. i must have misconfigured it for the above reasons.
the linkerd-init container would be replaced by the linkerd-network-validator anyways, if the cni takes over.
i must have been in full confusion and not distinguish between the drop
and add
requirements the distinction in this post takes place:
(container "linkerd-init" must not include "NET_ADMIN", "NET_RAW" in securityContext.capabilities.add)
---
// Skip NET_RAW and NET_ADMIN as the init container requires them to setup iptables.
if drop == "NET_RAW" || drop == "NET_ADMIN" { continue }
as posted before i am still not aware of the role of gatekeeper and how it cooperates with the native pss. this is just to clarify a bit, will report back once i am more into the weeds
What is the issue?
tl:dr the injected container "linkerd-init" must not include "NET_ADMIN", "NET_RAW" in securityContext.capabilities.add
How can it be reproduced?
Logs, error output, etc
the output log once the proxy-injector loglevel has been raised to debug: https://termbin.com/qomv
output of
linkerd check -o short
Environment
Possible solution
No response
Additional context
No response
Would you like to work on fixing this bug?
None