Using container.WaitStop with a negative time out can lead to dead
locks when the container's waitChan it's not being closed or written
to. The waitChan is closed only in seemingly unrelated code paths.
However, we only run into the WaitStop when the container was still
running during the second (redundant) kill; a rare case, which according to
my theory causes the dead lock.
The picture has changed in the meantime. The "seemingly unrelated" code paths point to containerd having troubles reaping zombies. Need to dig a bit deeper.
Using
container.WaitStop
with a negative time out can lead to dead locks when the container'swaitChan
it's not being closed or written to. ThewaitChan
is closed only in seemingly unrelated code paths.However, we only run into the
WaitStop
when the container was still running during the second (redundant) kill; a rare case, which according to my theory causes the dead lock.Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1751422 Signed-off-by: Valentin Rothberg rothberg@redhat.com
@rhatdan @giuseppe @TomSweeneyRedHat PTAL