kata-containers / runtime

Kata Containers version 1.x runtime (for version 2.x see https://github.com/kata-containers/kata-containers).
https://katacontainers.io/
Apache License 2.0
2.1k stars 375 forks source link

Failed to check if grpc server is working: context deadline exceeded: unknown when using vsocks #576

Closed chavafg closed 6 years ago

chavafg commented 6 years ago

I enabled a new Job that runs the tests using vsock, but I have seen this error frequently on different tests from docker and cri-containerd:

docker test failure:

docker kill killing container 
  with '26'(virtual timer expired) signal
  /tmp/jenkins/workspace/kata-containers-tests-vsock-fedora-27-PR/go/src/github.com/kata-containers/tests/vendor/github.com/onsi/ginkgo/extensions/table/table_entry.go:46
Running command '/usr/bin/docker [docker run --runtime kata-runtime --name NzW80u0qGirlMmcfV4zegNrZLoKxLt -dt busybox sh -c trap "exit 26" 26; echo TRAP_RUNNING; while :; do sleep 1; done]'
command failed error 'exit status 125'
[docker run --runtime kata-runtime --name NzW80u0qGirlMmcfV4zegNrZLoKxLt -dt busybox sh -c trap "exit 26" 26; echo TRAP_RUNNING; while :; do sleep 1; done]
Timeout: 60 seconds
Exit Code: 125
Stdout: 086543aa8dcc805affbc306c04035b10d8745ca8983b6758be08c7c5dd9f4b26

Stderr: docker: Error response from daemon: OCI runtime create failed: Failed to check if grpc server is working: context deadline exceeded: unknown.

cri-containerd test failure:

• Failure in Spec Setup (BeforeEach) [9.285 seconds]
[k8s.io] Container
/tmp/jenkins/workspace/kata-containers-tests-vsock-fedora-27-PR/go/src/github.com/kubernetes-incubator/cri-tools/pkg/framework/framework.go:72
  runtime should support basic operations on container
  /tmp/jenkins/workspace/kata-containers-tests-vsock-fedora-27-PR/go/src/github.com/kubernetes-incubator/cri-tools/pkg/validate/container.go:68
    runtime should support stopping container [Conformance] [BeforeEach]
    /tmp/jenkins/workspace/kata-containers-tests-vsock-fedora-27-PR/go/src/github.com/kubernetes-incubator/cri-tools/pkg/validate/container.go:100

    failed to create PodSandbox: rpc error: code = Unknown desc = failed to start sandbox container: failed to create containerd task: OCI runtime create failed: Failed to check if grpc server is working: context deadline exceeded: unknown
    Expected error:
        <*status.statusError | 0xc4204a8900>: {
            Code: 2,
            Message: "failed to start sandbox container: failed to create containerd task: OCI runtime create failed: Failed to check if grpc server is working: context deadline exceeded: unknown",
            Details: nil,
        }
        rpc error: code = Unknown desc = failed to start sandbox container: failed to create containerd task: OCI runtime create failed: Failed to check if grpc server is working: context deadline exceeded: unknown
    not to have occurred

Failed jobs: http://jenkins.katacontainers.io/job/kata-containers-tests-vsock-fedora-27-PR/18/consoleFull http://jenkins.katacontainers.io/job/kata-containers-tests-vsock-fedora-27-PR/16/consoleFull http://jenkins.katacontainers.io/job/kata-containers-tests-vsock-fedora-27-PR/12/consoleFull http://jenkins.katacontainers.io/job/kata-containers-tests-vsock-fedora-27-PR/13/consoleFull

chavafg commented 6 years ago

This can easily be reproduced on a VM. I used a VM with 4 vCPUs and 8GB of memory and Fedora 27.

Here is a log that shows container 1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b failing:

Aug 10 22:47:48 fedora27-raw kata-runtime[21107]: time="2018-08-10T22:47:48.48933268Z" level=info msg="vsock supported" arch=amd64 command=create name=kata-runtime pid=21107 source=runtime
Aug 10 22:47:48 fedora27-raw kata-runtime[21107]: time="2018-08-10T22:47:48.489702604Z" level=info msg="VSOCK supported, configure to not use proxy" arch=amd64 command=create name=kata-runtime pid=21107 source=runtime
Aug 10 22:47:48 fedora27-raw kata-runtime[21107]: time="2018-08-10T22:47:48.490002408Z" level=info arch=amd64 arguments="\"create --bundle /var/run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/1c6dba8cff209ca23d420d60c7
f724e6023497746c3845bd482b06f60396261b --pid-file /run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b/init.pid 1c6dba8cff209ca23d420d60c7f724e6023497746c384
5bd482b06f60396261b\"" command=create commit=4220a7a9be8f37a3bb685bb7809239d5c84b20dd name=kata-runtime pid=21107 source=runtime version=1.2.0
Aug 10 22:47:48 fedora27-raw kata-runtime[21107]: time="2018-08-10T22:47:48.490414339Z" level=debug msg="converting /run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b0
6f60396261b/config.json" arch=amd64 command=create container=1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b name=kata-runtime pid=21107 source=virtcontainers subsystem=oci
Aug 10 22:47:48 fedora27-raw kata-runtime[21107]: time="2018-08-10T22:47:48.492449072Z" level=debug arch=amd64 command=create container=1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b default-kernel-parameters="init=/u
sr/lib/systemd/systemd systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket" name=kata-runtime pid=21107 source=runtime
Aug 10 22:47:48 fedora27-raw kata-runtime[21107]: time="2018-08-10T22:47:48.493343315Z" level=debug msg="container rootfs: /var/lib/docker/overlay2/822bc07aea693cc3cfc2cfab02c98c49f8a8425579503ad812438d71c029c466/merged" arch=amd64 com
mand=create container=1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b name=kata-runtime pid=21107 source=virtcontainers subsystem=oci
Aug 10 22:47:48 fedora27-raw kata-runtime[21107]: time="2018-08-10T22:47:48.493626545Z" level=info msg="shm-size detected: 67108864" arch=amd64 command=create container=1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b n
ame=kata-runtime pid=21107 source=virtcontainers subsystem=oci
Aug 10 22:47:48 fedora27-raw kata-runtime[21107]: time="2018-08-10T22:47:48.494173937Z" level=debug msg="Creating bridges" arch=amd64 command=create container=1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b name=kata-r
untime pid=21107 source=virtcontainers subsystem=qemu
Aug 10 22:47:48 fedora27-raw kata-runtime[21107]: time="2018-08-10T22:47:48.494395315Z" level=debug msg="Creating UUID" arch=amd64 command=create container=1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b name=kata-runt
ime pid=21107 source=virtcontainers subsystem=qemu
Aug 10 22:47:48 fedora27-raw kata-runtime[21107]: time="2018-08-10T22:47:48.494840089Z" level=debug msg="agent: Using vsock VM socket endpoint" arch=amd64 command=create container=1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f
60396261b name=kata-runtime pid=21107 source=virtcontainers subsystem=kata_agent
Aug 10 22:47:48 fedora27-raw kata-runtime[21107]: time="2018-08-10T22:47:48.495088597Z" level=debug msg="Could not retrieve anything from storage" arch=amd64 command=create container=1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b
06f60396261b name=kata-runtime pid=21107 source=virtcontainers subsystem=kata_agent
Aug 10 22:47:48 fedora27-raw kata-runtime[21107]: time="2018-08-10T22:47:48.495290723Z" level=warning msg="fetch sandbox device failed" arch=amd64 command=create container=1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261
b error="open /run/vc/sbs/1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b/devices.json: no such file or directory" name=kata-runtime pid=21107 sandbox=1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b sa
ndboxid=1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b source=virtcontainers subsystem=sandbox
Aug 10 22:47:48 fedora27-raw kata-runtime[21107]: time="2018-08-10T22:47:48.566703693Z" level=info msg="Attaching endpoint" arch=amd64 command=create container=1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b endpoint-t
ype=virtual name=kata-runtime pid=21107 source=virtcontainers subsystem=network
Aug 10 22:47:48 fedora27-raw kata-runtime[21107]: time="2018-08-10T22:47:48.568857977Z" level=info msg="Starting VM" arch=amd64 command=create container=1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b name=kata-runtime
 pid=21107 sandbox=1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b source=virtcontainers subsystem=sandbox
Aug 10 22:47:48 fedora27-raw kata-runtime[21107]: time="2018-08-10T22:47:48.568949604Z" level=debug arch=amd64 command=create container=1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b default-kernel-parameters="tsc=rel
iable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 console=hvc1 iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 root=/dev/pmem0p1 rootfla
gs=dax,data=ordered,errors=remount-ro rw rootfstype=ext4 debug systemd.show_status=true systemd.log_level=debug" name=kata-runtime pid=21107 source=virtcontainers subsystem=qemu
Aug 10 22:47:48 fedora27-raw kata-runtime[21107]: time="2018-08-10T22:47:48.569173201Z" level=info msg="Adding extra file [0xc4201feb00 0xc420328150 0xc420328158 0xc420328160 0xc420328168 0xc420328170 0xc420328178 0xc420328180 0xc42032
8188 0xc420328110 0xc420328118 0xc420328120 0xc420328128 0xc420328130 0xc420328138 0xc420328140 0xc420328148]" arch=amd64 command=create container=1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b name=kata-runtime pid=2
1107 source=virtcontainers subsystem=qmp
Aug 10 22:47:48 fedora27-raw kata-runtime[21107]: time="2018-08-10T22:47:48.569249527Z" level=info msg="launching qemu with: [-name sandbox-1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b -uuid b14b9aa2-87ea-43fb-9121-
927bd3afd055 -machine pc,accel=kvm,kernel_irqchip,nvdimm -cpu host,pmu=off -qmp unix:/run/vc/vm/1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b/qmp.sock,server,nowait -m 2048M,slots=2,maxmem=7996M -device pci-bridge,bu
s=pci.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2 -device virtio-serial-pci,disable-modern=true,id=serial0 -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,id=charconsole0,path=/run/vc/vm/1c6dba8cff209ca23d420d60c
7f724e6023497746c3845bd482b06f60396261b/console.sock,server,nowait -device nvdimm,id=nv0,memdev=mem0 -object memory-backend-file,id=mem0,mem-path=/usr/share/kata-containers/kata-containers-image_clearlinux_agent_7b458b1.img,size=536870
912 -device virtio-scsi-pci,id=scsi0,disable-modern=true -device vhost-vsock-pci,disable-modern=true,vhostfd=3,id=vsock-1658177274,guest-cid=1658177274 -device virtio-9p-pci,disable-modern=true,fsdev=extra-9p-kataShared,mount_tag=kataS
hared -fsdev local,id=extra-9p-kataShared,path=/run/kata-containers/shared/sandboxes/1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b,security_model=none -netdev tap,id=network-0,vhost=on,vhostfds=4:5:6:7:8:9:10:11,fds=
12:13:14:15:16:17:18:19 -device driver=virtio-net-pci,netdev=network-0,mac=02:42:ac:11:00:02,disable-modern=true,mq=on,vectors=18 -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefault
s -nographic -daemonize -kernel /usr/share/kata-containers/vmlinuz-4.14.51-7 -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 co
nsole=hvc1 iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro rw rootfstype=ext4 debug systemd.show_status=true systemd.log_level=debug panic=1 nr_cpus=4 init=/usr/lib
/systemd/systemd systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket agent.log=debug -smp 1,cores=1,threads=1,sockets=1,maxcpus=4]" arch=amd64 command=create container=1c6dba8c
ff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b name=kata-runtime pid=21107 source=virtcontainers subsystem=qmp
Aug 10 22:47:48 fedora27-raw kata-runtime[21107]: time="2018-08-10T22:47:48.638652207Z" level=info msg="{\"QMP\": {\"version\": {\"qemu\": {\"micro\": 0, \"minor\": 11, \"major\": 2}, \"package\": \"\"}, \"capabilities\": []}}" arch=am
d64 command=create container=1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b name=kata-runtime pid=21107 source=virtcontainers subsystem=qmp
Aug 10 22:47:48 fedora27-raw kata-runtime[21107]: time="2018-08-10T22:47:48.639126268Z" level=info msg="QMP details" arch=amd64 command=create container=1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b name=kata-runtime
 pid=21107 qmp-capabilities= qmp-major-version=2 qmp-micro-version=0 qmp-minor-version=11 source=virtcontainers subsystem=qemu
Aug 10 22:47:48 fedora27-raw kata-runtime[21107]: time="2018-08-10T22:47:48.639229682Z" level=info msg="{\"execute\":\"qmp_capabilities\"}" arch=amd64 command=create container=1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f6039
6261b name=kata-runtime pid=21107 source=virtcontainers subsystem=qmp
Aug 10 22:47:48 fedora27-raw kata-runtime[21107]: time="2018-08-10T22:47:48.639516527Z" level=info msg="{\"return\": {}}" arch=amd64 command=create container=1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b name=kata-ru
ntime pid=21107 source=virtcontainers subsystem=qmp
Aug 10 22:47:48 fedora27-raw kata-runtime[21107]: time="2018-08-10T22:47:48.639782264Z" level=info msg="VM started" arch=amd64 command=create container=1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b name=kata-runtime
pid=21107 sandbox=1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b source=virtcontainers subsystem=sandbox
Aug 10 22:47:48 fedora27-raw kata-runtime[21107]: time="2018-08-10T22:47:48.639846907Z" level=info msg="No proxy started because of no-proxy implementation" arch=amd64 command=create container=1c6dba8cff209ca23d420d60c7f724e6023497746c
3845bd482b06f60396261b name=kata-runtime pid=21107 sandbox=1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b source=virtcontainers subsystem=sandbox
Aug 10 22:47:48 fedora27-raw kata-runtime[21107]: time="2018-08-10T22:47:48.64001337Z" level=info msg="proxy started" arch=amd64 command=create container=1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b name=kata-runtim
e pid=21107 proxy-pid=0 proxy-url="vsock://1658177274:1024" sandbox=1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b source=virtcontainers subsystem=kata_agent
Aug 10 22:47:48 fedora27-raw kata-runtime[21107]: time="2018-08-10T22:47:48.640070981Z" level=info msg="New client" arch=amd64 command=create container=1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b name=kata-runtime
pid=21107 source=virtcontainers subsystem=kata_agent url="vsock://1658177274:1024"
Aug 10 22:47:53 fedora27-raw kata-runtime[21107]: time="2018-08-10T22:47:53.640304228Z" level=info msg="Stopping Sandbox" arch=amd64 command=create container=1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b name=kata-ru
ntime pid=21107 source=virtcontainers subsystem=qemu
Aug 10 22:47:53 fedora27-raw kata-runtime[21107]: time="2018-08-10T22:47:53.640826501Z" level=info msg="{\"QMP\": {\"version\": {\"qemu\": {\"micro\": 0, \"minor\": 11, \"major\": 2}, \"package\": \"\"}, \"capabilities\": []}}" arch=am
d64 command=create container=1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b name=kata-runtime pid=21107 source=virtcontainers subsystem=qmp
Aug 10 22:47:53 fedora27-raw kata-runtime[21107]: time="2018-08-10T22:47:53.641207862Z" level=info msg="{\"execute\":\"qmp_capabilities\"}" arch=amd64 command=create container=1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f6039
6261b name=kata-runtime pid=21107 source=virtcontainers subsystem=qmp
Aug 10 22:47:53 fedora27-raw kata-runtime[21107]: time="2018-08-10T22:47:53.641593664Z" level=info msg="{\"return\": {}}" arch=amd64 command=create container=1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b name=kata-ru
ntime pid=21107 source=virtcontainers subsystem=qmp
Aug 10 22:47:53 fedora27-raw kata-runtime[21107]: time="2018-08-10T22:47:53.641686377Z" level=info msg="{\"execute\":\"quit\"}" arch=amd64 command=create container=1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b name=k
ata-runtime pid=21107 source=virtcontainers subsystem=qmp
Aug 10 22:47:53 fedora27-raw kata-runtime[21107]: time="2018-08-10T22:47:53.64191526Z" level=info msg="{\"return\": {}}" arch=amd64 command=create container=1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b name=kata-run
time pid=21107 source=virtcontainers subsystem=qmp
Aug 10 22:47:53 fedora27-raw kata-runtime[21107]: time="2018-08-10T22:47:53.641944025Z" level=info msg="{\"timestamp\": {\"seconds\": 1533941273, \"microseconds\": 641913}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}" arch=a
md64 command=create container=1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b name=kata-runtime pid=21107 source=virtcontainers subsystem=qmp
Aug 10 22:47:53 fedora27-raw kata-runtime[21107]: time="2018-08-10T22:47:53.642054317Z" level=info msg="Detaching endpoint" arch=amd64 command=create container=1c6dba8cff209ca23d420d60c7f724e6023497746c3845bd482b06f60396261b endpoint-t
ype=virtual name=kata-runtime pid=21107 source=virtcontainers subsystem=network
Aug 10 22:47:53 fedora27-raw kata-runtime[21107]: time="2018-08-10T22:47:53.659534228Z" level=error msg="Failed to check if grpc server is working: context deadline exceeded" arch=amd64 command=create container=1c6dba8cff209ca23d420d60
c7f724e6023497746c3845bd482b06f60396261b name=kata-runtime pid=21107 source=runtime
Aug 10 22:47:54 fedora27-raw kata-runtime[21208]: time="2018-08-10T22:47:54.263856015Z" level=info msg="loaded configuration" arch=amd64 command=create file=/usr/share/defaults/kata-containers/configuration.toml format=TOML name=kata-r
untime pid=21208 source=runtime

BTW, I used the same VM configured to NOT use vsocks and all tests passed. So I think this is related to the vsock implementation.

/cc @devimc @jcvenegas

nitkon commented 6 years ago

@chavafg : I am facing exactly the same issue on my ppc64le system. How do I configure Kata not to use vsock , so that I bypass the issue for now?

# sudo docker run -ti --runtime kata-runtime busybox sh
docker: Error response from daemon: oci runtime error: Failed to check if grpc server is working: rpc error: code = DeadlineExceeded desc = context deadline exceeded.
nitkon commented 6 years ago

But my host kernel is Linux llmtul01b 4.4.0-119-generic #143-Ubuntu SMP Mon Apr 2 16:08:02 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux and afaik below kernel 4.8 virtio-serial is used and above 4.8 vsock is used. Then why am I seeing this issue?

jodh-intel commented 6 years ago

@nitkon - have you enabled vsock support in your config file?

nitkon commented 6 years ago

Hi @jodh-intel , my host kernel is 4.4 , so vsock support wont be there right? However, I have vsock in toml file, as u suggested but the result is the same.

sudo docker run -ti --runtime kata-runtime busybox sh
docker: Error response from daemon: oci runtime error: Failed to check if grpc server is working: rpc error: code = DeadlineExceeded desc = context deadline exceeded.
nitkon commented 6 years ago

Anyway, we can extend the deadline so that we bypass it at least locally on our system?