hyperhq / runv

Hypervisor-based Runtime for OCI
Apache License 2.0
828 stars 129 forks source link

Handle exec being called before a container is initialized correctly. #379

Closed wrouesnel closed 7 years ago

wrouesnel commented 7 years ago

Running a command like the following:

cid=$(./docker run -tid busybox sh) && \
    echo $cid && ./docker exec -ti $cid echo hello

would block indefinitely because we would return success to docker, but fail because the exec is attempted before the container is started up.

This leads to the docker daemon blocking indefinitely waiting for an exec command it believes was successful via containerd.

This patch solves the problem by blocking the containerd responses until the VM startup has returned and moved to process wait (or else returning an error if it fails at this stage). This resolves deadlocks into docker and allows the above command to succeed.

Closes #279

wrouesnel commented 7 years ago

This is a far simpler patch that takes advantage of the new hyperstart / runv changes. I've confirmed without it #279 is still an issue, with it, #279 is fixed.

laijs commented 7 years ago

c.run(p) is called in the supervisor RWMutext, I would mind if it is blocked too long, otherwise I would not use go routine to create container. Is there any other solution?

wrouesnel commented 7 years ago

Seems unavoidable to me - we can't return from the supervisor until we know the process has actually launched. Under all normal circumstances this is quick - and certainly much better behavior then now where you block indefinitely if it fails.

laijs commented 7 years ago

you can add a 'started chan error' argument to hp.createContainer() and c.run() and "doing receive from the chan" after the lock is unlocked.

WDYT?

wrouesnel commented 7 years ago

I looked into doing this, and it feels like a very leaky abstraction at the Supervisor level. I'm going to look into splitting the global supervisor lock into a finer-grained pod/container lock so concurrent requests can start/addProcesses to containers.

laijs commented 7 years ago

@wrouesnel runv cli is being improved continuously and had just been refactored #537 . As a result, it seams that the changes in this pr is outdated.

Close this PR. If the problem mentioned still exists, a new PR could be created for it.

Thank you for your contribution.