kubernetes-sigs / kind

Kubernetes IN Docker - local clusters for testing Kubernetes
https://kind.sigs.k8s.io/
Apache License 2.0
13.5k stars 1.56k forks source link

The "SeccompDefault" feature gate is not working as documented #2609

Closed denist-huma closed 2 years ago

denist-huma commented 2 years ago

What happened:

As I understood from the https://kubernetes.io/docs/tutorials/security/seccomp/#enable-the-use-of-runtimedefault-as-the-default-seccomp-profile-for-all-workloads, I expect Kind to patch my pod with the RuntimeDefault seccomp profile. I see that it is not the case. Let's test with a workload from docs:

cat > default-pod.yaml << EOF     
apiVersion: v1
kind: Pod
metadata:
  name: default-pod
  labels:
    app: default-pod
spec:
  containers:
  - name: test-container
    image: hashicorp/http-echo:0.2.3
    args:
    - "-text=just made some more syscalls!"
EOF

kubectl -f default-pod.yaml apply

$ k get pod/default-pod -o=jsonpath={.spec.securityContext}
{}

What you expected to happen:

$ k get pod/default-pod -o=jsonpath={.spec.securityContext}
{"seccompProfile": {"type": "RuntimeDefault"}}

to output a JSON from

seccompProfile:
      type: RuntimeDefault

How to reproduce it (as minimally and precisely as possible):

prepare Kind

cat > kind-1.22.yaml << EOF
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  image: kindest/node:v1.22.4@sha256:ca3587e6e545a96c07bf82e2c46503d9ef86fc704f44c17577fca7bcabf5f978
featureGates:
  "SeccompDefault": true
EOF

kind create cluster --config kind-1.22.yaml

Anything else we need to know?:

Environment:

Server: Containers: 7 Running: 2 Paused: 0 Stopped: 5 Images: 171 Server Version: 20.10.12 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: cgroupfs Cgroup Version: 1 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc Default Runtime: runc Init Binary: docker-init containerd version: 7b11cfaabd73bb80907dd23182b9347b4245eb5d runc version: v1.0.2-0-g52b36a2 init version: de40ad0 Security Options: apparmor seccomp Profile: default Kernel Version: 5.11.0-37-generic Operating System: Ubuntu 20.04.3 LTS OSType: linux Architecture: x86_64 CPUs: 4 Total Memory: 15.41GiB Name: L560 ID: VBQU:DI5A:UJCA:J4JE:HGAV:OOT5:6OO5:QIEO:LEYP:37RR:QSWH:4STC Docker Root Dir: /var/lib/docker Debug Mode: false Username: silaradost Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false


- OS (e.g. from `/etc/os-release`):

```sh
$ sudo cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.3 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.3 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
aojea commented 2 years ago

the docs says

SeccompDefault is an optional kubelet feature gate as well as corresponding --seccomp-default command line flag. Both have to be enabled simultaneously to use the feature.

do you have the command line flag enabled on the kubelet?

BenTheElder commented 2 years ago

cc @saschagrunert I think maybe the docs have the feature-gate but not the flag, maybe we should add that.|

I haven't had a chance to work with this yet.

saschagrunert commented 2 years ago

@BenTheElder a yeah, the kind config needs to have the kubelet flag as well :+1:

@denist-huma we will not mutate the existing security context (or the deprecated annotations) with that alpha feature. Reason is to be able to turn it off again without causing any harm to running workloads. It's just a kubelet scoped feature which changes the profile right before passing it to the container runtime. crictl inspect … should give you the runtime spec, which contains the full seccomp profile for verification.

Note: We may change that when graduating the feature to make it more visible to end users. There were also plans to show the seccomp profile in the kubectl describe output.

BenTheElder commented 2 years ago

(FYI: you can call crictl like docker exec kind-control-plane crictl inspect)

should we file a k/website issue / PR for the kubelet flag?

saschagrunert commented 2 years ago

(FYI: you can call crictl like docker exec kind-control-plane crictl inspect)

should we file a k/website issue / PR for the kubelet flag?

Yes, I take care of it today.

saschagrunert commented 2 years ago

See https://github.com/kubernetes/website/pull/31534

denist-huma commented 2 years ago

Thanks @saschagrunert https://github.com/kubernetes/website/pull/31534 worked for me.

$ docker exec -it kind-worker bash -c \
>     'crictl inspect $(crictl ps --name=test-container -q) | jq .info.runtimeSpec.linux.seccomp'
Full output, Spoiler warning ```yaml { "defaultAction": "SCMP_ACT_ERRNO", "architectures": [ "SCMP_ARCH_X86_64", "SCMP_ARCH_X86", "SCMP_ARCH_X32" ], "syscalls": [ { "names": [ "accept", "accept4", "access", "adjtimex", "alarm", "bind", "brk", "capget", "capset", "chdir", "chmod", "chown", "chown32", "clock_adjtime", "clock_adjtime64", "clock_getres", "clock_getres_time64", "clock_gettime", "clock_gettime64", "clock_nanosleep", "clock_nanosleep_time64", "close", "close_range", "connect", "copy_file_range", "creat", "dup", "dup2", "dup3", "epoll_create", "epoll_create1", "epoll_ctl", "epoll_ctl_old", "epoll_pwait", "epoll_pwait2", "epoll_wait", "epoll_wait_old", "eventfd", "eventfd2", "execve", "execveat", "exit", "exit_group", "faccessat", "faccessat2", "fadvise64", "fadvise64_64", "fallocate", "fanotify_mark", "fchdir", "fchmod", "fchmodat", "fchown", "fchown32", "fchownat", "fcntl", "fcntl64", "fdatasync", "fgetxattr", "flistxattr", "flock", "fork", "fremovexattr", "fsetxattr", "fstat", "fstat64", "fstatat64", "fstatfs", "fstatfs64", "fsync", "ftruncate", "ftruncate64", "futex", "futex_time64", "futimesat", "getcpu", "getcwd", "getdents", "getdents64", "getegid", "getegid32", "geteuid", "geteuid32", "getgid", "getgid32", "getgroups", "getgroups32", "getitimer", "getpeername", "getpgid", "getpgrp", "getpid", "getppid", "getpriority", "getrandom", "getresgid", "getresgid32", "getresuid", "getresuid32", "getrlimit", "get_robust_list", "getrusage", "getsid", "getsockname", "getsockopt", "get_thread_area", "gettid", "gettimeofday", "getuid", "getuid32", "getxattr", "inotify_add_watch", "inotify_init", "inotify_init1", "inotify_rm_watch", "io_cancel", "ioctl", "io_destroy", "io_getevents", "io_pgetevents", "io_pgetevents_time64", "ioprio_get", "ioprio_set", "io_setup", "io_submit", "io_uring_enter", "io_uring_register", "io_uring_setup", "ipc", "kill", "lchown", "lchown32", "lgetxattr", "link", "linkat", "listen", "listxattr", "llistxattr", "_llseek", "lremovexattr", "lseek", "lsetxattr", "lstat", "lstat64", "madvise", "membarrier", "memfd_create", "mincore", "mkdir", "mkdirat", "mknod", "mknodat", "mlock", "mlock2", "mlockall", "mmap", "mmap2", "mprotect", "mq_getsetattr", "mq_notify", "mq_open", "mq_timedreceive", "mq_timedreceive_time64", "mq_timedsend", "mq_timedsend_time64", "mq_unlink", "mremap", "msgctl", "msgget", "msgrcv", "msgsnd", "msync", "munlock", "munlockall", "munmap", "nanosleep", "newfstatat", "_newselect", "open", "openat", "openat2", "pause", "pidfd_open", "pidfd_send_signal", "pipe", "pipe2", "poll", "ppoll", "ppoll_time64", "prctl", "pread64", "preadv", "preadv2", "prlimit64", "pselect6", "pselect6_time64", "pwrite64", "pwritev", "pwritev2", "read", "readahead", "readlink", "readlinkat", "readv", "recv", "recvfrom", "recvmmsg", "recvmmsg_time64", "recvmsg", "remap_file_pages", "removexattr", "rename", "renameat", "renameat2", "restart_syscall", "rmdir", "rseq", "rt_sigaction", "rt_sigpending", "rt_sigprocmask", "rt_sigqueueinfo", "rt_sigreturn", "rt_sigsuspend", "rt_sigtimedwait", "rt_sigtimedwait_time64", "rt_tgsigqueueinfo", "sched_getaffinity", "sched_getattr", "sched_getparam", "sched_get_priority_max", "sched_get_priority_min", "sched_getscheduler", "sched_rr_get_interval", "sched_rr_get_interval_time64", "sched_setaffinity", "sched_setattr", "sched_setparam", "sched_setscheduler", "sched_yield", "seccomp", "select", "semctl", "semget", "semop", "semtimedop", "semtimedop_time64", "send", "sendfile", "sendfile64", "sendmmsg", "sendmsg", "sendto", "setfsgid", "setfsgid32", "setfsuid", "setfsuid32", "setgid", "setgid32", "setgroups", "setgroups32", "setitimer", "setpgid", "setpriority", "setregid", "setregid32", "setresgid", "setresgid32", "setresuid", "setresuid32", "setreuid", "setreuid32", "setrlimit", "set_robust_list", "setsid", "setsockopt", "set_thread_area", "set_tid_address", "setuid", "setuid32", "setxattr", "shmat", "shmctl", "shmdt", "shmget", "shutdown", "sigaltstack", "signalfd", "signalfd4", "sigprocmask", "sigreturn", "socket", "socketcall", "socketpair", "splice", "stat", "stat64", "statfs", "statfs64", "statx", "symlink", "symlinkat", "sync", "sync_file_range", "syncfs", "sysinfo", "tee", "tgkill", "time", "timer_create", "timer_delete", "timer_getoverrun", "timer_gettime", "timer_gettime64", "timer_settime", "timer_settime64", "timerfd_create", "timerfd_gettime", "timerfd_gettime64", "timerfd_settime", "timerfd_settime64", "times", "tkill", "truncate", "truncate64", "ugetrlimit", "umask", "uname", "unlink", "unlinkat", "utime", "utimensat", "utimensat_time64", "utimes", "vfork", "vmsplice", "wait4", "waitid", "waitpid", "write", "writev" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "personality" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 0, "value": 0, "op": "SCMP_CMP_EQ" } ] }, { "names": [ "personality" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 0, "value": 8, "op": "SCMP_CMP_EQ" } ] }, { "names": [ "personality" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 0, "value": 131072, "op": "SCMP_CMP_EQ" } ] }, { "names": [ "personality" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 0, "value": 131080, "op": "SCMP_CMP_EQ" } ] }, { "names": [ "personality" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 0, "value": 4294967295, "op": "SCMP_CMP_EQ" } ] }, { "names": [ "arch_prctl", "modify_ldt" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "chroot" ], "action": "SCMP_ACT_ALLOW" }, { "names": [ "clone" ], "action": "SCMP_ACT_ALLOW", "args": [ { "index": 0, "value": 2114060288, "op": "SCMP_CMP_MASKED_EQ" } ] }, { "names": [ "clone3" ], "action": "SCMP_ACT_ERRNO", "errnoRet": 38 } ] } ```
saschagrunert commented 2 years ago

@denist-huma the profile should be under .info.runtimeSpec.linux.seccomp, like described there: https://deploy-preview-31534--kubernetes-io-main-staging.netlify.app/docs/tutorials/security/seccomp/#enable-the-use-of-runtimedefault-as-the-default-seccomp-profile-for-all-workloads

BenTheElder commented 2 years ago

Thanks!