kubernetes / kubectl

Issue tracker and mirror of kubectl code
Apache License 2.0
2.89k stars 924 forks source link

kubectl debug: profile "sysadmin" does not work as expected when uid != 0 is specified #1650

Open Phil1602 opened 2 months ago

Phil1602 commented 2 months ago

What happened: I wanted to create an ephemeral container with sysadmin (or netadmin) profile to be able to capture traffic using tcpdump using the following command:

kubectl debug test-pod -it --image nicolaka/netshoot --profile=sysadmin -- zsh

Defaulting debug container name to debugger-wj6qq.
If you don't see a command prompt, try pressing enter.

test-pod% whoami
whoami: unknown uid 1000

test-pod% tcpdump
tcpdump: eth0: You don't have permission to perform this capture on that device
(socket: Operation not permitted)

The ephemeral container is set to privileged: true as expected, but the Pod level securityContext forces the ephemeral container to run as user 1000 which is IMO an unwanted behavior for an ephemeral container with sysadmin profile set.

What you expected to happen: I would expect my ephemeral container with sysadmin to be able to capture traffic in any case.

On a container level securityContext I would not only expect privileged: true, but also runAsUser: 0 to avoid such user override collisions from pod level. Otherwise: a parameter to override the user for the ephemeral container would help in that regard as well.

How to reproduce it (as minimally and precisely as possible):

  1. Create a Pod with the following securityContext
apiVersion: v1
kind: Pod
metadata:
  labels:
    run: test-pod
  name: test-pod
spec:
  securityContext:
    runAsUser: 1000 # Override user != 0
  containers:
  - image: kennethreitz/httpbin
    name: test-pod
    resources: {}
  dnsPolicy: ClusterFirst
  restartPolicy: Always
  1. Attach ephemeral Container via debug command and profile sysadmin set
kubectl debug test-pod -it --image nicolaka/netshoot --profile=sysadmin -- zsh

Defaulting debug container name to debugger-wj6qq.
If you don't see a command prompt, try pressing enter.

test-pod% whoami
whoami: unknown uid 1000

test-pod% tcpdump
tcpdump: eth0: You don't have permission to perform this capture on that device
(socket: Operation not permitted)
test-pod% 

Anything else we need to know?:

Environment:

ardaguclu commented 2 months ago

@mochizuki875 what do you think about this?.

mochizuki875 commented 2 months ago

@ardaguclu I think the same situation is happening here with ephemeral container.

When privileged: true is set for a container running as root user, the following CapabilitySet is applied. The key point is that the CapEff where the capabilities actually used for permission checks are set.

apiVersion: v1
kind: Pod
metadata:
  labels:
    run: privileged
  name: privileged
spec:
  containers:
  - image: busybox
    command: ["sh", "-c", "sleep infinity"]
    name: privileged
    securityContext:
      privileged: true
  terminationGracePeriodSeconds: 0
$ kubectl exec -it privileged -- /bin/sh
/ # whoami
root
/ # grep Cap /proc/1/status
CapInh: 0000000000000000
CapPrm: 000001ffffffffff
CapEff: 000001ffffffffff
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000

On the other hand, when non-root user is specified in runAsUser, even if privileged: true or specific capabilities are set, the container process does not have the appropriate capability(CapEff), and it can not have the required permission.

apiVersion: v1
kind: Pod
metadata:
  labels:
    run: runasuser-with-privileged
  name: runasuser-with-privileged
spec:
  securityContext:
    runAsUser: 1000
  containers:
  - image: busybox
    command: ["sh", "-c", "sleep infinity"]
    name: runasuser-with-privileged
    securityContext:
      privileged: true
  terminationGracePeriodSeconds: 0
$ kubectl exec -it runasuser-with-privileged -- /bin/sh
~ $ whoami
whoami: unknown uid 1000
~ $ grep Cap /proc/1/status
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000

I have not checked the details yet, but this issue has been reported in #56374, and KEP #2763 has been proposed. However, it seems to not be implemented yet.

So currently, I think the simplest workaround is to define runAsUser under the containers field.

apiVersion: v1
kind: Pod
metadata:
  labels:
    run: test-pod
  name: test-pod
spec:
  # securityContext:
  #   runAsUser: 1000 # Override user != 0
  containers:
  - image: kennethreitz/httpbin
    name: test-pod
    securityContext:
      runAsUser: 1000 # Override user != 0
    resources: {}
  dnsPolicy: ClusterFirst
  restartPolicy: Always
$ kubectl debug test-pod -it --image nicolaka/netshoot --profile=sysadmin -- zsh
Defaulting debug container name to debugger-dwt84.
If you don't see a command prompt, try pressing enter.
test-pod  ~  whoami
root
test-pod  ~  grep Cap /proc/$$/status
CapInh: 0000000000000000
CapPrm: 000001ffffffffff
CapEff: 000001ffffffffff
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000

or using Custom Profile.

profile-runas-root.yaml

securityContext:
  runAsUser: 0
  privileged: true
$ kubectl debug test-pod -it --image=busybox --custom=profile-runas-root.yaml -- /bin/sh
Defaulting debug container name to debugger-mjp6g.
If you don't see a command prompt, try pressing enter.
/ # grep Cap /proc/$$/status
CapInh: 0000000000000000
CapPrm: 000001ffffffffff
CapEff: 000001ffffffffff
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000

/ # exit

$ kubectl get pod test-pod -o=jsonpath='{.spec.ephemeralContainers[0].securityContext}' | jq .
{
  "privileged": true,
  "runAsUser": 0
}

Another solution which I come up with is to set runAsUser: 0 to ephemeral container when --profile=sysadmin or --profile=netadmin is specified. However, I don't know it's appropriate...

ardaguclu commented 2 months ago

Thanks a lot for your extensive investigation @mochizuki875 and it is really helpful.

I think we should wait this KEP https://github.com/kubernetes/enhancements/issues/2763 is revived and the suggested workaround to overcome this issue is using custom profiling ^^.

/triage accepted /priority backlog

mochizuki875 commented 2 months ago

@ardaguclu Thank you for your reviewing. I agree with that, and I think it's one of the cases where custom profile works well👍

frittentheke commented 2 months ago

So currently, I think the simplest workaround is to define runAsUser under the containers field.

There are workarounds, but surely a user (of kubectl) should not have to do all that to simply achieve a privileged ephemeral container to run as UID 0 (or any other). The running pods and their runAsUser might originate from some upstream Helm chart and also a workaround is only good if it can be found easily. Don't want to discourage and scare people away from unprivileged and UID!=0 containers just because it's that one or two steps harder to debug them ;-)

Thank you for your reviewing. I agree with that, and I think it's one of the cases where custom profile works well👍

Another solution which I come up with is to set runAsUser: 0 to ephemeral container when --profile=sysadmin or --profile=netadmin is specified. However, I don't know it's appropriate...

If the ephemeral container does not have anything else in its spec that is totally reasonable. And talking about kubectl patching a pod based on distinct CLI options to dynamically create a debug container it seems the more reasonable to simply add this aspect to spec patch created controlled by kubectl anyways?

Phil1602 commented 2 months ago

The workaround using custom profiles is fine for me, but as @frittentheke already said:

I totally agree with the fact, that the client side profile sysadmin indicates that it creates an ephemeral container, which is capable of doing operations with capabilities set. Even though the root cause might be that privileged does not work as expected, one would probably expect being root when setting sysadmin.

Maybe we could at least implement a client side warning if the Pod spec contains runAsUser to inform the user it's sysadmin ephemeral container does not run as root?

ardaguclu commented 2 months ago

Thank you @frittentheke and @Phil1602 for dropping your valuable comments.

Maybe we could at least implement a client side warning if the Pod spec contains runAsUser to inform the user it's sysadmin ephemeral container does not run as root?

That's exactly what I was thinking about.

mochizuki875 commented 2 months ago

That's exactly what I was thinking about.

Ok, I'll do that and create PR.

/assign

ardaguclu commented 2 months ago

@mochizuki875 I think, we can recommend the possible custom profiling configuration we discussed ^^ in this warning message.

mochizuki875 commented 2 months ago

@ardaguclu Yes I'm thinking about same idea just now! I would appreciate your help when considering the message later.