Closed wrouesnel closed 7 years ago
Note: this probably fixes a whole host of containerd not responding related issues when there are VM state changes, as the problem is generic to any changes in VMs being started and stopped.
SendGenericOperation() is going to be removed, could you change vm.Kill() to call the v.ctx.poweroffVM() directly please?
quote from your log in the #423
goroutine 45 [chan receive]:
github.com/hyperhq/runv/hypervisor.(*Vm).GenericOperation(0xc4202eae00, 0x997861, 0xb, 0x9e0820, 0xc420350a20, 0x2, 0x2, 0xc42013f658, 0xc42013f501)
/home/will/src/go/src/github.com/hyperhq/runv/hypervisor/vm.go:739 +0xaa
github.com/hyperhq/runv/hypervisor.(*Vm).Kill(0xc4202eae00)
/home/will/src/go/src/github.com/hyperhq/runv/hypervisor/vm.go:324 +0xa2
github.com/hyperhq/runv/supervisor.createHyperPod(0xe0e200, 0xc4201ee0c0, 0xc420161520, 0x1, 0x80, 0x200, 0x8ae800, 0xc4201616a8)
/home/will/src/go/src/github.com/hyperhq/runv/supervisor/hyperpod.go:503 +0xa9d
Done. The changes to use SendVmEvent()
are still necessary to giving a consistent VM operating state though, since they prevent blocking in a case where the event handler is detached.
When a virtual machine error occurs leading to a VM shutdown, there is no explicit denial of access to the
vm.Hub
channel used to send commands, but once theVmContext.loop()
function exited, nothing existed handling and cancelling commands on that channel, nor notifying processes to stop writing to it.This meant that runv could reliably trigger Docker/supervisord lock-ups when a problem occurred with the VM startup while container commands were being waited on (most commonly:
docker run <some image> sleep 10
, but having an incorrect kernel configured in docker).This commit fixes the problem by removing the duplication of channel logic between
Vm
andVmContext
.Vm.SendGenericOperation
is made internal, and calls to a wrapper function onVmContext.SendVmEvent
to enqueue messages to the context. This separation enables a check for an active event handler to be performed before enqueuing a new message - and immediately returns an error if the message cannot be queued allowing the caller to detect a situation where the VM has died.Closes #423.