kata-containers / runtime

Kata Containers version 1.x runtime (for version 2.x see https://github.com/kata-containers/kata-containers).
https://katacontainers.io/
Apache License 2.0
2.1k stars 374 forks source link

cgroups: Incorrect cgroup setup with crio #1515

Closed mcastelino closed 3 years ago

mcastelino commented 5 years ago

cgroups: Incorrect cgroup setup with crio

When running a simple workload such as

apiVersion: v1
kind: Pod
metadata:
  name: guar-2kc
spec:
  runtimeClassName: kata-qemu
  containers:
  - name: busybee
    image: busybox
    resources:
      limits:
        cpu: 2
        memory: "400Mi"
    command: ["md5sum"]
    args: ["/dev/urandom"]
  - name: busybum
    image: busybox
    resources:
      limits:
        cpu: 3
        memory: "200Mi"
    command: ["md5sum"]
    args: ["/dev/urandom"]

we find that tasks setup is incorrect.

kata$for i in `ls pod*/**/tasks`; do echo $i && for j in `cat $i`; do ps auxw | grep $j;done; done;
pod5884dc6c-5b0c-11e9-90bc-525400cfa589/crio-2cc1e6e2ae40b7c94dac72d68c1fff6b6d9e8058f8e26c4bd5e03ac9318b3956/tasks
root     13041  0.0  0.0   6360   916 pts/0    S+   21:40   0:00 grep 24985
root     24985  0.0  0.2 1004980 19352 ?       Sl   21:13   0:00 /opt/kata/libexec/kata-containers/kata-shim -agent unix:///run/vc/sbs/f105771f71cfaeed86175bc2bc10f9925c75d12b749231716dfe9f86e640ff0a/proxy.sock -container 2cc1e6e2ae40b7c94dac72d68c1fff6b6d9e8058f8e26c4bd5e03ac9318b3956 -exec-id 2cc1e6e2ae40b7c94dac72d68c1fff6b6d9e8058f8e26c4bd5e03ac9318b3956
root     13043  0.0  0.0   6360   852 pts/0    S+   21:40   0:00 grep 24986
root     13045  0.0  0.0   6360   976 pts/0    S+   21:40   0:00 grep 24987
root     13047  0.0  0.0   6360   852 pts/0    S+   21:40   0:00 grep 24988
root     13049  0.0  0.0   6360   916 pts/0    S+   21:40   0:00 grep 24989
root     13051  0.0  0.0   6360   852 pts/0    S+   21:40   0:00 grep 24990
root     13053  0.0  0.0   6360   916 pts/0    S+   21:40   0:00 grep 24992
root     13055  0.0  0.0   6360   916 pts/0    S+   21:40   0:00 grep 24993
root     13057  0.0  0.0   6360   856 pts/0    S+   21:40   0:00 grep 24994
root     13059  0.0  0.0   6360   916 pts/0    S+   21:40   0:00 grep 24995
root     13061  0.0  0.0   6360   980 pts/0    S+   21:40   0:00 grep 24996
root     13063  0.0  0.0   6360   852 pts/0    S+   21:40   0:00 grep 24997
root     13065  0.0  0.0   6360   916 pts/0    S+   21:40   0:00 grep 24998
pod5884dc6c-5b0c-11e9-90bc-525400cfa589/crio-5be201403ea55bb4d5cb8de2904bfb7f4251a5fafce45886ae639841fd2833be/tasks
root     13068  0.0  0.0   6360   920 pts/0    S+   21:40   0:00 grep 25183
root     25183  0.0  0.2 858668 21992 ?        Sl   21:13   0:00 /opt/kata/libexec/kata-containers/kata-shim -agent unix:///run/vc/sbs/f105771f71cfaeed86175bc2bc10f9925c75d12b749231716dfe9f86e640ff0a/proxy.sock -container 5be201403ea55bb4d5cb8de2904bfb7f4251a5fafce45886ae639841fd2833be -exec-id 5be201403ea55bb4d5cb8de2904bfb7f4251a5fafce45886ae639841fd2833be
root     13070  0.0  0.0   6360   972 pts/0    S+   21:40   0:00 grep 25185
root     13072  0.0  0.0   6360   916 pts/0    S+   21:40   0:00 grep 25186
root     13074  0.0  0.0   6360   852 pts/0    S+   21:40   0:00 grep 25187
root     13076  0.0  0.0   6360   904 pts/0    S+   21:40   0:00 grep 25188
root     13078  0.0  0.0   6360   856 pts/0    S+   21:40   0:00 grep 25189
root     13080  0.0  0.0   6360   916 pts/0    S+   21:40   0:00 grep 25190
root     13082  0.0  0.0   6360   856 pts/0    S+   21:40   0:00 grep 25191
root     13084  0.0  0.0   6360   916 pts/0    S+   21:40   0:00 grep 25192
root     13086  0.0  0.0   6360   920 pts/0    S+   21:40   0:00 grep 25193
root     13088  0.0  0.0   6360   856 pts/0    S+   21:40   0:00 grep 25194
pod5884dc6c-5b0c-11e9-90bc-525400cfa589/crio-conmon-2cc1e6e2ae40b7c94dac72d68c1fff6b6d9e8058f8e26c4bd5e03ac9318b3956/tasks
root     13091  0.0  0.0   6360   976 pts/0    S+   21:40   0:00 grep 24964
root     24964  0.0  0.0  78328  2008 ?        Ssl  21:13   0:00 /usr/libexec/crio/conmon --syslog -c 2cc1e6e2ae40b7c94dac72d68c1fff6b6d9e8058f8e26c4bd5e03ac9318b3956 -u 2cc1e6e2ae40b7c94dac72d68c1fff6b6d9e8058f8e26c4bd5e03ac9318b3956 -r /opt/kata/bin/kata-qemu -b /var/run/containers/storage/overlay-containers/2cc1e6e2ae40b7c94dac72d68c1fff6b6d9e8058f8e26c4bd5e03ac9318b3956/userdata -p /var/run/containers/storage/overlay-containers/2cc1e6e2ae40b7c94dac72d68c1fff6b6d9e8058f8e26c4bd5e03ac9318b3956/userdata/pidfile -l /var/log/pods/default_guar-2kc_5884dc6c-5b0c-11e9-90bc-525400cfa589/busybee/0.log --exit-dir /var/run/crio/exits --socket-dir-path /var/run/crio --log-level error
root     13093  0.0  0.0   6360   900 pts/0    S+   21:40   0:00 grep 24966
pod5884dc6c-5b0c-11e9-90bc-525400cfa589/crio-conmon-5be201403ea55bb4d5cb8de2904bfb7f4251a5fafce45886ae639841fd2833be/tasks
root     13096  0.0  0.0   6360   980 pts/0    S+   21:40   0:00 grep 25090
root     25090  0.0  0.0  78328  2008 ?        Ssl  21:13   0:00 /usr/libexec/crio/conmon --syslog -c 5be201403ea55bb4d5cb8de2904bfb7f4251a5fafce45886ae639841fd2833be -u 5be201403ea55bb4d5cb8de2904bfb7f4251a5fafce45886ae639841fd2833be -r /opt/kata/bin/kata-qemu -b /var/run/containers/storage/overlay-containers/5be201403ea55bb4d5cb8de2904bfb7f4251a5fafce45886ae639841fd2833be/userdata -p /var/run/containers/storage/overlay-containers/5be201403ea55bb4d5cb8de2904bfb7f4251a5fafce45886ae639841fd2833be/userdata/pidfile -l /var/log/pods/default_guar-2kc_5884dc6c-5b0c-11e9-90bc-525400cfa589/busybum/0.log --exit-dir /var/run/crio/exits --socket-dir-path /var/run/crio --log-level error
root     13098  0.0  0.0   6360   916 pts/0    S+   21:40   0:00 grep 25092
pod5884dc6c-5b0c-11e9-90bc-525400cfa589/crio-conmon-f105771f71cfaeed86175bc2bc10f9925c75d12b749231716dfe9f86e640ff0a/tasks
root      2846  0.0  0.0  78328  2020 ?        Ssl  21:06   0:00 /usr/libexec/crio/conmon --syslog -c 0df3ec69da862320d2b8947aa2481f92de3274f1fb3cffd8594a26d1e6627b35 -u 0df3ec69da862320d2b8947aa2481f92de3274f1fb3cffd8594a26d1e6627b35 -r /usr/bin/runc -b /var/run/containers/storage/overlay-containers/0df3ec69da862320d2b8947aa2481f92de3274f1fb3cffd8594a26d1e6627b35/userdata -p /var/run/containers/storage/overlay-containers/0df3ec69da862320d2b8947aa2481f92de3274f1fb3cffd8594a26d1e6627b35/userdata/pidfile -l /var/log/pods/kube-system_etcd-clr-01_af3e4a507ec0af8c2233ee5bf0783073/0df3ec69da862320d2b8947aa2481f92de3274f1fb3cffd8594a26d1e6627b35.log --exit-dir /var/run/crio/exits --socket-dir-path /var/run/crio --log-level error
root      3034  0.0  0.0  78328  2020 ?        Ssl  21:06   0:00 /usr/libexec/crio/conmon --syslog -c 96eb687b2c4467fb893ef300c6ff3cf66a57ef92d0c464f4e71bf4e4718a31ce -u 96eb687b2c4467fb893ef300c6ff3cf66a57ef92d0c464f4e71bf4e4718a31ce -r /usr/bin/runc -b /var/run/containers/storage/overlay-containers/96eb687b2c4467fb893ef300c6ff3cf66a57ef92d0c464f4e71bf4e4718a31ce/userdata -p /var/run/containers/storage/overlay-containers/96eb687b2c4467fb893ef300c6ff3cf66a57ef92d0c464f4e71bf4e4718a31ce/userdata/pidfile -l /var/log/pods/kube-system_etcd-clr-01_af3e4a507ec0af8c2233ee5bf0783073/etcd/0.log --exit-dir /var/run/crio/exits --socket-dir-path /var/run/crio --log-level error
root     13101  0.0  0.0   6360   852 pts/0    S+   21:40   0:00 grep 7830
root     13103  0.0  0.0   6360   920 pts/0    S+   21:40   0:00 grep 19505
root     13105  0.0  0.0   6360   852 pts/0    S+   21:40   0:00 grep 24584
root     24584  0.0  0.0  78328   172 ?        Ssl  21:13   0:00 /usr/libexec/crio/conmon --syslog -c f105771f71cfaeed86175bc2bc10f9925c75d12b749231716dfe9f86e640ff0a -u f105771f71cfaeed86175bc2bc10f9925c75d12b749231716dfe9f86e640ff0a -r /opt/kata/bin/kata-qemu -b /var/run/containers/storage/overlay-containers/f105771f71cfaeed86175bc2bc10f9925c75d12b749231716dfe9f86e640ff0a/userdata -p /var/run/containers/storage/overlay-containers/f105771f71cfaeed86175bc2bc10f9925c75d12b749231716dfe9f86e640ff0a/userdata/pidfile -l /var/log/pods/default_guar-2kc_5884dc6c-5b0c-11e9-90bc-525400cfa589/f105771f71cfaeed86175bc2bc10f9925c75d12b749231716dfe9f86e640ff0a.log --exit-dir /var/run/crio/exits --socket-dir-path /var/run/crio --log-level error
root     13107  0.0  0.0   6360   916 pts/0    S+   21:40   0:00 grep 24586
root     13109  0.0  0.0   6360   852 pts/0    S+   21:40   0:00 grep 24602
root     24602  100  2.7 3590552 226136 ?      Sl   21:13  26:33 /opt/kata/bin/qemu-system-x86_64 -name sandbox-f105771f71cfaeed86175bc2bc10f9925c75d12b749231716dfe9f86e640ff0a -uuid ada3582a-9766-4030-82e7-95427d95ad17 -machine pc,accel=kvm,kernel_irqchip,nvdimm -cpu host,pmu=off -qmp unix:/run/vc/vm/f105771f71cfaeed86175bc2bc10f9925c75d12b749231716dfe9f86e640ff0a/qmp.sock,server,nowait -m 2048M,slots=10,maxmem=8992M -device pci-bridge,bus=pci.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= -device virtio-serial-pci,disable-modern=true,id=serial0,romfile= -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,id=charconsole0,path=/run/vc/vm/f105771f71cfaeed86175bc2bc10f9925c75d12b749231716dfe9f86e640ff0a/console.sock,server,nowait -device nvdimm,id=nv0,memdev=mem0 -object memory-backend-file,id=mem0,mem-path=/opt/kata/share/kata-containers/kata-containers-image_clearlinux_1.6.1_agent_992b4987a32.img,size=134217728 -device virtio-scsi-pci,id=scsi0,disable-modern=true,romfile= -object rng-random,id=rng0,filename=/dev/urandom -device virtio-rng,rng=rng0,romfile= -device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 -chardev socket,id=charch0,path=/run/vc/vm/f105771f71cfaeed86175bc2bc10f9925c75d12b749231716dfe9f86e640ff0a/kata.sock,server,nowait -device virtio-9p-pci,disable-modern=true,fsdev=extra-9p-kataShared,mount_tag=kataShared,romfile= -fsdev local,id=extra-9p-kataShared,path=/run/kata-containers/shared/sandboxes/f105771f71cfaeed86175bc2bc10f9925c75d12b749231716dfe9f86e640ff0a,security_model=none -netdev tap,id=network-0,vhost=on,vhostfds=3,fds=4 -device driver=virtio-net-pci,netdev=network-0,mac=b2:78:0b:80:8b:a2,disable-modern=true,mq=on,vectors=4,romfile= -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic -daemonize -kernel /opt/kata/share/kata-containers/vmlinuz-4.19.28-31 -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 console=hvc1 iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro rw rootfstype=ext4 quiet systemd.show_status=false panic=1 nr_cpus=8 init=/usr/lib/systemd/systemd systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket systemd.mask=systemd-journald.service systemd.mask=systemd-journald.socket systemd.mask=systemd-journal-flush.service systemd.mask=systemd-udevd.service systemd.mask=systemd-udevd.socket systemd.mask=systemd-udev-trigger.service systemd.mask=systemd-timesyncd.service systemd.mask=systemd-update-utmp.service systemd.mask=systemd-tmpfiles-setup.service systemd.mask=systemd-tmpfiles-cleanup.service systemd.mask=systemd-tmpfiles-cleanup.timer systemd.mask=tmp.mount -pidfile /run/vc/vm/f105771f71cfaeed86175bc2bc10f9925c75d12b749231716dfe9f86e640ff0a/pid -smp 1,cores=1,threads=1,sockets=1,maxcpus=8
root     24604  0.0  0.0      0     0 ?        S    21:13   0:00 [vhost-24602]
root     24606  0.0  0.0      0     0 ?        S    21:13   0:00 [kvm-pit/24602]
root     13111  0.0  0.0   6360   920 pts/0    S+   21:40   0:00 grep 24603
root     13113  0.0  0.0   6360   852 pts/0    S+   21:40   0:00 grep 24604
root     24604  0.0  0.0      0     0 ?        S    21:13   0:00 [vhost-24602]
root     13115  0.0  0.0   6360   856 pts/0    S+   21:40   0:00 grep 24607
root     24607  0.0  0.1 1215688 15420 ?       Sl   21:13   0:01 /opt/kata/libexec/kata-containers/kata-proxy -listen-socket unix:///run/vc/sbs/f105771f71cfaeed86175bc2bc10f9925c75d12b749231716dfe9f86e640ff0a/proxy.sock -mux-socket /run/vc/vm/f105771f71cfaeed86175bc2bc10f9925c75d12b749231716dfe9f86e640ff0a/kata.sock -sandbox f105771f71cfaeed86175bc2bc10f9925c75d12b749231716dfe9f86e640ff0a
root     13117  0.0  0.0   6360   904 pts/0    S+   21:40   0:00 grep 24608
root     13119  0.0  0.0   6360   916 pts/0    S+   21:40   0:00 grep 24609
root     13121  0.0  0.0   6360   968 pts/0    S+   21:40   0:00 grep 24610
root     13123  0.0  0.0   6360   916 pts/0    S+   21:40   0:00 grep 24611
root     13125  0.0  0.0   6360   984 pts/0    S+   21:40   0:00 grep 24612
root     13127  0.0  0.0   6360   856 pts/0    S+   21:40   0:00 grep 24613
root     13129  0.0  0.0   6360   856 pts/0    S+   21:40   0:00 grep 24614
root     13131  0.0  0.0   6360   916 pts/0    S+   21:40   0:00 grep 24615
root     13133  0.0  0.0   6360   916 pts/0    S+   21:40   0:00 grep 24616
root     13135  0.0  0.0   6360   852 pts/0    S+   21:40   0:00 grep 25025
root     13137  0.0  0.0   6360   908 pts/0    S+   21:40   0:00 grep 27555
root     13139  0.0  0.0   6360   920 pts/0    S+   21:40   0:00 grep 27556
root     13141  0.0  0.0   6360   916 pts/0    S+   21:40   0:00 grep 31294
pod5884dc6c-5b0c-11e9-90bc-525400cfa589/crio-f105771f71cfaeed86175bc2bc10f9925c75d12b749231716dfe9f86e640ff0a/tasks
root     13144  0.0  0.0   6360   852 pts/0    S+   21:40   0:00 grep 24605
root     13146  0.0  0.0   6360   984 pts/0    S+   21:40   0:00 grep 24639
root     24639  0.0  0.2 858668 22160 ?        Sl   21:13   0:00 /opt/kata/libexec/kata-containers/kata-shim -agent unix:///run/vc/sbs/f105771f71cfaeed86175bc2bc10f9925c75d12b749231716dfe9f86e640ff0a/proxy.sock -container f105771f71cfaeed86175bc2bc10f9925c75d12b749231716dfe9f86e640ff0a -exec-id f105771f71cfaeed86175bc2bc10f9925c75d12b749231716dfe9f86e640ff0a
root     13148  0.0  0.0   6360   852 pts/0    S+   21:40   0:00 grep 24641
root     13150  0.0  0.0   6360   852 pts/0    S+   21:40   0:00 grep 24642
root     13152  0.0  0.0   6360   972 pts/0    S+   21:40   0:00 grep 24644
root     13154  0.0  0.0   6360   852 pts/0    S+   21:40   0:00 grep 24645
root     13156  0.0  0.0   6360   916 pts/0    S+   21:40   0:00 grep 24646
root     13158  0.0  0.0   6360   856 pts/0    S+   21:40   0:00 grep 24648
root     13160  0.0  0.0   6360   920 pts/0    S+   21:40   0:00 grep 24649
root     13162  0.0  0.0   6360   976 pts/0    S+   21:40   0:00 grep 24650
root     13164  0.0  0.0   6360   852 pts/0    S+   21:40   0:00 grep 24651
root     13166  0.0  0.0   6360   916 pts/0    S+   21:40   0:00 grep 24652
root     13168  0.0  0.0   6360   856 pts/0    S+   21:40   0:00 grep 24979
root     13170  0.0  0.0   6360   920 pts/0    S+   21:40   0:00 grep 24980
root     13172  0.0  0.0   6360   916 pts/0    S+   21:40   0:00 grep 25105
root     13174  0.0  0.0   6360   852 pts/0    S+   21:40   0:00 grep 25106
root     13176  0.0  0.0   6360   852 pts/0    S+   21:40   0:00 grep 25107
pod5884dc6c-5b0c-11e9-90bc-525400cfa589/tasks

For more gory detail https://gist.github.com/mcastelino/e975cd26958554b4c46c7168067b66b0

Environment

Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.0", GitCommit:"641856db18352033a0d96dbc99153fa3b27298e5", GitTreeState:"archive", BuildDate:"2019-03-29T16:29:07Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:02:58Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
Version:  0.1.0
RuntimeName:  cri-o
RuntimeVersion:  1.13.1
RuntimeApiVersion:  v1alpha1
kata-runtime  : 1.6.1
   commit   : 8efc5718813224722f87ad119edcf9753fd6147d
   OCI specs: 1.0.1-dev
mcastelino commented 5 years ago

/cc @devimc @jcvenegas @egernst @bergwolf

jcvenegas commented 5 years ago

I'll try to reproduce locally, probably the runtime is spawn in this cgroup, let me confirm