google / gvisor-containerd-shim

containerd shim for gVisor
https://gvisor.dev
Apache License 2.0
80 stars 28 forks source link

Waiting on a container after it exited returns an internal error #48

Open adunham-stripe opened 4 years ago

adunham-stripe commented 4 years ago

After the changes in #28, the call to runtime.Wait here now uses context.Background. We've observed this causing failures when attempting to Wait on a very short-lived container, since the container will then not be around to wait, and the exit status will be reported as internalErrorCode (128).

There does not appear to be a mechanism to detect this race condition through the shim; there's no error type returned that we could use to detect this, and it's not possible distinguish between an exit status of 128 returned legitimately by the wait operation and one returned by the shim.

Any suggestions on how to distinguish between these two cases ("real" 128 exit status vs. "race condition and can't wait" 128 exit status)?