hyperhq / runv

Hypervisor-based Runtime for OCI
Apache License 2.0
827 stars 129 forks source link

Bug report: leaking container while doing pressure test #366

Open WeiZhang555 opened 7 years ago

WeiZhang555 commented 7 years ago
$ docker rm -f ea3bcc75e63a
Error response from daemon: Could not kill running container ea3bcc75e63adb0ad33fe59fc733af6cac9f6f484dabd33b3e798af3f458250e, cannot remove - Cannot kill container ea3bcc75e63adb0ad33fe59fc733af6cac9f6f484dabd33b3e798af3f458250e: rpc error: code = 2 desc = "The container ea3bcc75e63adb0ad33fe59fc733af6cac9f6f484dabd33b3e798af3f458250e or the process init is not found"

c.run(p) start the VM and container in a goroutine, but never return the error, so if the VM start failed, runv-containerd still send success response to docker, docker will regard this as a running container but it's not. That's why docker can't kill it and can't remove it any more, because it's trying to kill nothing!

When I try to dig deeper, I found that sometimes https://github.com/hyperhq/runv/blob/master/supervisor/container.go#L35 will hang, I can't find the real cause of the hanging, PLEASE find it, it's really vital!

one extra thing:

https://github.com/hyperhq/runv/blob/master/supervisor/container.go#L71 should return an error but not nil, I'll send a patch for this.

Note: I was testing based on our internal version which diverge a little bit with the latest upstream version, but I believe the problem is still there

Crazykev commented 7 years ago

When I try to dig deeper, I found that sometimes https://github.com/hyperhq/runv/blob/master/supervisor/container.go#L35 will hang

I found this too when integrate cri-o, it seems runv will hang when starting failed before container could start.

laijs commented 7 years ago

the related code was changed a little, could you check it again please?

WeiZhang555 commented 7 years ago

I'll try tomorrow :-)