Closed wrouesnel closed 7 years ago
@wrouesnel Thanks for your investigation! I confirmed this bug, return error as a short fix looks good to me :) , I can take this over if you don't have time.
@gao-feng what's the status of this issue?
@wrouesnel It's a long time, I check this problem again and found docker try to find out the container in StateRequest, docker will return error if it can not find required container. https://github.com/moby/moby/blob/master/libcontainerd/client_linux.go#L271
Can you help me to figure out the problem now? I don't remember it now... Thanks!
Cannot reproduce this problem after restart runv-containerd. close it now.
If a docker daemon connects to runv in containerd mode and issues an initial state call, the Docker daemon will hang indefinitely waiting for a correct container response.
This seems to be because when restarted, containerd mode drops all known state about containers since the supervisor is deleted, and never reloads and verifies it.
So when https://github.com/hyperhq/runv/blob/master/containerd/api/grpc/server/server.go#L120 is executed, the response consists only of a machine stanza and no response is provided for the requested containerID.
It looks like the short fix for this would be return an error if the container ID in StateRequest can't be found (docker is okay receiving errors in this stanza). A more complete fix would be reload state.json to allow docker to request root unmount if runv is killed unexpectedly. The full fix would be to attempt to recover orphaned virtual machine containers (but that might be way more trouble then it's worth - reloading and killing them might be more practical).
Related to #369 and probably #366 as well.