hyperhq / runv

Hypervisor-based Runtime for OCI
Apache License 2.0
826 stars 129 forks source link

failed to start latest runv with docker #559

Open WeiZhang555 opened 7 years ago

WeiZhang555 commented 7 years ago

I've tried the latest runv after refactored with docker, and failed to start a container.

Related information: docker version: docker 1.11.2 with additional internal patches. runv version: 50d2ec7e5ceba4ac48b3dfbedb4ee56dd68329d2 (glog is replaced with logrus) hyperstart: 5c2e7092481b23c6702413831fa0251c318535c5 (using kernel+hyper-initrd.img from this commit)

Reproduce steps:

  1. docker run -ti --runtime runv busybox sh docker: Error response from daemon: containerd: container not started.

Error logs:

https://gist.github.com/WeiZhang555/590fceed273ae8bcdca21f4a95e5d110

I see this message a lot, but not sure if this is the root cause:

time="2017-08-17T15:58:12+08:00" level=debug msg="got a message {"error": {"class": "GenericError", "desc": "Invalid JSON syntax"}}" 
WeiZhang555 commented 7 years ago

More useful(maybe) information:

I print more information of "msg" and "cmd" around "qmp_capabilities" qmp command:

time="2017-08-17T20:15:55+08:00" level=debug msg="========> msg: map[QMP:map[version:map[package: qemu:map[micro:0 minor:6 major:2]] capabilities:[]]]"
time="2017-08-17T20:15:56+08:00" level=debug msg="========> cmd "{\"execute\":\"qmp_capabilities\"}", len 30, sent 30"

code modification:

diff --git a/hypervisor/qemu/qmp_handler.go b/hypervisor/qemu/qmp_handler.go
index 94e0290..a5ed166 100644
--- a/hypervisor/qemu/qmp_handler.go
+++ b/hypervisor/qemu/qmp_handler.go
@@ -179,6 +179,7 @@ func qmpInitializer(ctx *hypervisor.VmContext) {
        logrus.Debug("begin qmp init...")

        err = decoder.Decode(&msg)
+       logrus.Debugf("========> msg: %v", msg)
        if err != nil {
                logrus.Errorf("get qmp welcome failed: %v", err)
                qc.qmp <- qmpFail(err.Error(), nil)
@@ -193,7 +194,8 @@ func qmpInitializer(ctx *hypervisor.VmContext) {
                qc.qmp <- qmpFail(err.Error(), nil)
                return
        }
-       _, err = conn.Write(cmd)
+       n, err := conn.Write(cmd)
+       logrus.Debugf("========> cmd %q, len %d, sent %d", cmd, len(cmd), n)
        if err != nil {
                logrus.Errorf("qmp_capabilities send failed: %v", err)
                qc.qmp <- qmpFail(err.Error(), nil)
bergwolf commented 7 years ago

FYI, I tried docker-ce (17.06.1-ce) with runv master and it works.

$docker version
Client:
 Version:      17.06.1-ce
 API version:  1.30
 Go version:   go1.8.3
 Git commit:   874a737
 Built:        Thu Aug 17 22:53:49 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.06.1-ce
 API version:  1.30 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   874a737
 Built:        Thu Aug 17 23:01:50 2017
 OS/Arch:      linux/amd64
 Experimental: false
$cat /etc/docker/daemon.json
{
  "storage-driver": "overlay2",
  "storage-opts": [
    "overlay2.override_kernel_check=true"
  ],
        "default-runtime": "runv",
        "runtimes": {
                "runv": {
                        "path": "runv"
                }
        }
}
$ps aux|grep runv
root     101348  0.0  0.1 216780  4996 ?        Sl   18:27   0:00 docker-containerd-shim 85d2b21b2d2ba22eb7ab687d93dd79512796210b0b12427f29c379e8b3d65a50 /var/run/docker/libcontainerd/85d2b21b2d2ba22eb7ab687d93dd79512796210b0b12427f29c379e8b3d65a50 runv
root     101362  0.6  0.5 346556 19404 ?        Ssl  18:27   0:01 runv --root /run/runv --log_dir /var/log/hyper proxy --vmid vm-VVEwFXMkOW --hyperstart-ctl-sock unix:///var/run/hyper/vm-VVEwFXMkOW/hyper.sock --hyperstart-stream-sock unix:///var/run/hyper/vm-VVEwFXMkOW/tty.sock --proxy-hyperstart /var/run/hyper/vm-VVEwFXMkOW/hyperstartgrpc.sock --watch-vm-console /var/run/hyper/vm-VVEwFXMkOW/console.sock --watch-hyperstart
root     101377  0.0  0.5 360832 19912 pts/15   Ssl+ 18:27   0:00 runv --root /run/runv --log_dir /var/log/hyper/shim-85d2b21b2d2ba22eb7ab687d93dd79512796210b0b12427f29c379e8b3d65a50 shim --container 85d2b21b2d2ba22eb7ab687d93dd79512796210b0b12427f29c379e8b3d65a50 --process init --proxy-stdio --proxy-exit-code --proxy-signal --proxy-winsize
$docker run -it --rm busybox
/ # ls
bin   dev   etc   home  lib   proc  root  sys   tmp   usr   var
/ # ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 52:54:62:1c:33:3b brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.2/16 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:62ff:fe1c:333b/64 scope link
       valid_lft forever preferred_lft forever
/ # ping www.baidu.com
PING www.baidu.com (61.135.169.125): 56 data bytes
64 bytes from 61.135.169.125: seq=0 ttl=127 time=58.665 ms
64 bytes from 61.135.169.125: seq=1 ttl=127 time=58.076 ms
^C
--- www.baidu.com ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 58.076/58.370/58.665 ms
WeiZhang555 commented 7 years ago

Weird, I'm using a docker 1.11.2 with additional necessary patches, still can't start. I'm trying to find more useful information

WeiZhang555 commented 7 years ago

It's weird, I tried to replace all glog with logrus(hlog is also using logrus now), then everything works, if I keep using glog, then the container still can't start. Maybe there's some race problem when mix use of logrus(in docker) and glog(in runv)?

We decide to move on based on current status, we'll try to investigate more once got time.