hyperhq / runv

Hypervisor-based Runtime for OCI
Apache License 2.0
828 stars 129 forks source link

containerd mode State() call doesn't handle stale container responses properly for docker #377

Closed wrouesnel closed 7 years ago

wrouesnel commented 7 years ago

If a docker daemon connects to runv in containerd mode and issues an initial state call, the Docker daemon will hang indefinitely waiting for a correct container response.

This seems to be because when restarted, containerd mode drops all known state about containers since the supervisor is deleted, and never reloads and verifies it.

So when https://github.com/hyperhq/runv/blob/master/containerd/api/grpc/server/server.go#L120 is executed, the response consists only of a machine stanza and no response is provided for the requested containerID.

It looks like the short fix for this would be return an error if the container ID in StateRequest can't be found (docker is okay receiving errors in this stanza). A more complete fix would be reload state.json to allow docker to request root unmount if runv is killed unexpectedly. The full fix would be to attempt to recover orphaned virtual machine containers (but that might be way more trouble then it's worth - reloading and killing them might be more practical).

Related to #369 and probably #366 as well.

gao-feng commented 7 years ago

@wrouesnel Thanks for your investigation! I confirmed this bug, return error as a short fix looks good to me :) , I can take this over if you don't have time.

gnawux commented 7 years ago

@gao-feng what's the status of this issue?

gao-feng commented 7 years ago

@wrouesnel It's a long time, I check this problem again and found docker try to find out the container in StateRequest, docker will return error if it can not find required container. https://github.com/moby/moby/blob/master/libcontainerd/client_linux.go#L271

Can you help me to figure out the problem now? I don't remember it now... Thanks!

gao-feng commented 7 years ago

Cannot reproduce this problem after restart runv-containerd. close it now.