Open Phil1602 opened 2 months ago
@mochizuki875 what do you think about this?.
@ardaguclu I think the same situation is happening here with ephemeral container.
When privileged: true
is set for a container running as root user, the following CapabilitySet is applied.
The key point is that the CapEff
where the capabilities actually used for permission checks are set.
apiVersion: v1
kind: Pod
metadata:
labels:
run: privileged
name: privileged
spec:
containers:
- image: busybox
command: ["sh", "-c", "sleep infinity"]
name: privileged
securityContext:
privileged: true
terminationGracePeriodSeconds: 0
$ kubectl exec -it privileged -- /bin/sh
/ # whoami
root
/ # grep Cap /proc/1/status
CapInh: 0000000000000000
CapPrm: 000001ffffffffff
CapEff: 000001ffffffffff
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000
On the other hand, when non-root user is specified in runAsUser
, even if privileged: true
or specific capabilities are set, the container process does not have the appropriate capability(CapEff
), and it can not have the required permission.
apiVersion: v1
kind: Pod
metadata:
labels:
run: runasuser-with-privileged
name: runasuser-with-privileged
spec:
securityContext:
runAsUser: 1000
containers:
- image: busybox
command: ["sh", "-c", "sleep infinity"]
name: runasuser-with-privileged
securityContext:
privileged: true
terminationGracePeriodSeconds: 0
$ kubectl exec -it runasuser-with-privileged -- /bin/sh
~ $ whoami
whoami: unknown uid 1000
~ $ grep Cap /proc/1/status
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000
I have not checked the details yet, but this issue has been reported in #56374, and KEP #2763 has been proposed. However, it seems to not be implemented yet.
So currently, I think the simplest workaround is to define runAsUser
under the containers
field.
apiVersion: v1
kind: Pod
metadata:
labels:
run: test-pod
name: test-pod
spec:
# securityContext:
# runAsUser: 1000 # Override user != 0
containers:
- image: kennethreitz/httpbin
name: test-pod
securityContext:
runAsUser: 1000 # Override user != 0
resources: {}
dnsPolicy: ClusterFirst
restartPolicy: Always
$ kubectl debug test-pod -it --image nicolaka/netshoot --profile=sysadmin -- zsh
Defaulting debug container name to debugger-dwt84.
If you don't see a command prompt, try pressing enter.
test-pod ~ whoami
root
test-pod ~ grep Cap /proc/$$/status
CapInh: 0000000000000000
CapPrm: 000001ffffffffff
CapEff: 000001ffffffffff
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000
or using Custom Profile.
profile-runas-root.yaml
securityContext:
runAsUser: 0
privileged: true
$ kubectl debug test-pod -it --image=busybox --custom=profile-runas-root.yaml -- /bin/sh
Defaulting debug container name to debugger-mjp6g.
If you don't see a command prompt, try pressing enter.
/ # grep Cap /proc/$$/status
CapInh: 0000000000000000
CapPrm: 000001ffffffffff
CapEff: 000001ffffffffff
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000
/ # exit
$ kubectl get pod test-pod -o=jsonpath='{.spec.ephemeralContainers[0].securityContext}' | jq .
{
"privileged": true,
"runAsUser": 0
}
Another solution which I come up with is to set runAsUser: 0
to ephemeral container when --profile=sysadmin
or --profile=netadmin
is specified.
However, I don't know it's appropriate...
Thanks a lot for your extensive investigation @mochizuki875 and it is really helpful.
I think we should wait this KEP https://github.com/kubernetes/enhancements/issues/2763 is revived and the suggested workaround to overcome this issue is using custom profiling ^^.
/triage accepted /priority backlog
@ardaguclu Thank you for your reviewing. I agree with that, and I think it's one of the cases where custom profile works well👍
So currently, I think the simplest workaround is to define runAsUser under the containers field.
There are workarounds, but surely a user (of kubectl) should not have to do all that to simply achieve a privileged ephemeral container to run as UID 0 (or any other). The running pods and their runAsUser
might originate from some upstream Helm chart and also a workaround is only good if it can be found easily. Don't want to discourage and scare people away from unprivileged and UID!=0 containers just because it's that one or two steps harder to debug them ;-)
Thank you for your reviewing. I agree with that, and I think it's one of the cases where custom profile works well👍
Another solution which I come up with is to set
runAsUser: 0
to ephemeral container when--profile=sysadmin
or--profile=netadmin
is specified. However, I don't know it's appropriate...
If the ephemeral container does not have anything else in its spec that is totally reasonable. And talking about kubectl
patching a pod based on distinct CLI options to dynamically create a debug container it seems the more reasonable to simply add this aspect to spec patch created controlled by kubectl anyways?
The workaround using custom profiles is fine for me, but as @frittentheke already said:
I totally agree with the fact, that the client side profile sysadmin
indicates that it creates an ephemeral container, which is capable of doing operations with capabilities set. Even though the root cause might be that privileged does not work as expected, one would probably expect being root
when setting sysadmin
.
Maybe we could at least implement a client side warning if the Pod spec contains runAsUser
to inform the user it's sysadmin
ephemeral container does not run as root
?
Thank you @frittentheke and @Phil1602 for dropping your valuable comments.
Maybe we could at least implement a client side warning if the Pod spec contains runAsUser to inform the user it's sysadmin ephemeral container does not run as root?
That's exactly what I was thinking about.
That's exactly what I was thinking about.
Ok, I'll do that and create PR.
/assign
@mochizuki875 I think, we can recommend the possible custom profiling configuration we discussed ^^ in this warning message.
@ardaguclu Yes I'm thinking about same idea just now! I would appreciate your help when considering the message later.
What happened: I wanted to create an ephemeral container with
sysadmin
(ornetadmin
) profile to be able to capture traffic usingtcpdump
using the following command:The ephemeral container is set to
privileged: true
as expected, but the Pod levelsecurityContext
forces the ephemeral container to run as user1000
which is IMO an unwanted behavior for an ephemeral container withsysadmin
profile set.What you expected to happen: I would expect my ephemeral container with
sysadmin
to be able to capture traffic in any case.On a container level
securityContext
I would not only expectprivileged: true
, but alsorunAsUser: 0
to avoid such user override collisions from pod level. Otherwise: a parameter to override the user for the ephemeral container would help in that regard as well.How to reproduce it (as minimally and precisely as possible):
sysadmin
setAnything else we need to know?:
Environment:
kubectl version
):