Closed wrouesnel closed 7 years ago
This is a far simpler patch that takes advantage of the new hyperstart / runv changes. I've confirmed without it #279 is still an issue, with it, #279 is fixed.
c.run(p) is called in the supervisor RWMutext, I would mind if it is blocked too long, otherwise I would not use go routine to create container. Is there any other solution?
Seems unavoidable to me - we can't return from the supervisor until we know the process has actually launched. Under all normal circumstances this is quick - and certainly much better behavior then now where you block indefinitely if it fails.
you can add a 'started chan error' argument to hp.createContainer() and c.run() and "doing receive from the chan" after the lock is unlocked.
WDYT?
I looked into doing this, and it feels like a very leaky abstraction at the Supervisor level. I'm going to look into splitting the global supervisor lock into a finer-grained pod/container lock so concurrent requests can start/addProcesses to containers.
@wrouesnel runv cli is being improved continuously and had just been refactored #537 . As a result, it seams that the changes in this pr is outdated.
Close this PR. If the problem mentioned still exists, a new PR could be created for it.
Thank you for your contribution.
Running a command like the following:
would block indefinitely because we would return success to docker, but fail because the exec is attempted before the container is started up.
This leads to the docker daemon blocking indefinitely waiting for an exec command it believes was successful via containerd.
This patch solves the problem by blocking the containerd responses until the VM startup has returned and moved to process wait (or else returning an error if it fails at this stage). This resolves deadlocks into docker and allows the above command to succeed.
Closes #279