VMs with orphaned attachments won't boot if attachment is already on new VM

justinclayton commented 8 years ago

When container cluster managers are in play, containers that die suddenly due to hardware failure will be rescheduled on another Docker host VM before the VM is ready for use again. Unfortunately, in this recovery scenario, the VM that was powered down never gets a chance to detach VMDKs that are in use as Docker volumes, leaving the VM with an essentially orphaned attachment that is no longer required, and actually prevents it from booting back up if that VMDK is already attached elsewhere.

My setup:

2 VMs vm-1 and vm-2 on host esx-1
Docker volume vol1 on VMFS datastore datastore1
dvv 1.0.beta on VMs running Photon 1.0-13c08b6-GA on ESX 6.0.0 build 3620759

Here's the repro:

1: On vm-1: # docker run -d -v vol1:/data busybox sh -c "while :; do date >> /data/dates.txt; sleep 1; done"
2: Execute a hard Power Off to vm-1. This simulates a VMware HA event.
3: ESX service still shows volume attached to vm-1:

# /usr/lib/vmware/vmdkops/bin/vmdkops_admin.py ls
Volume  Datastore        Created By VM  Created                   Attached To VM  Policy  Capacity  Used
------  ---------------  -------------  ------------------------  --------------  ------  --------  --------
vol1  datastore1  vm-1    Wed Jul  6 18:08:43 2016  vm-1          N/A     10.00GB   252.00MB

4: Simulate recovery on vm-2: # docker run -d -v vol1:/data busybox sh -c "while :; do date >> /data/dates.txt; sleep 1; done" (This would be the behavior of any container management system like Mesos Marathon or Docker Swarm.)
5: ESX service now shows volume attached to vm-2.
6: Attempt to execute a Power On to vm-1. It will fail, giving this error stack:

* An error was received from the ESX host while powering on VM vm-1.
* Failed to start the virtual machine.
* Module Disk power on failed. 
* Cannot open the disk '/vmfs/volumes/57507243-ad6492fd-d63c-ecf4bbc7e390/dockvols/vol1.vmdk' or one of the snapshot disks it depends on.
* Failed to lock the file

The two workarounds to this currently are:

Stop the container using vol1 on vm-2, which will cleanly detach and unlock the vmdk.
Remove the virtual hard disk attachment manually on vm-1.

After employing one of the above steps you will be able to power on vm-1 successfully.

msterin commented 8 years ago

@justinclayton - thank for the detailed report ! Related to #92 (which is low priority since Docker fixed unmount in container 'rm -f'). @pdhamdhere - suggested fix is to listen to VM power events and auto-detach all in dockvols on power off.

govint commented 8 years ago

From the description it looks like the bug happened in step 4 of the repro. If the volume was in use by VM1 when it was shutdown by force then the volume plugin should have disallowed vm2 from attaching the volume.

The correct way should have been to,

Figure that the volume was attached to another VM

Make a check that the other VM is on and if not then

remove the volume from that VM (VM1) configuration and

attach the volume to the requesting VM (VM2)

Its wrong to have allowed the attach to VM2 when the volume is already attached to a VM and we have no idea if thats using it or not.

govint commented 8 years ago

I checked the provision for alarms and events in VC and apparently we can create a VM power-state alarm to run a a script "on the VC server" or run a method (from a list thats in VC). Its perhaps better for the plugin to figure that if a VM is asking to attach a volume then figure if the volume is attached to a live VM (which can be queried) and then proceed to detach the volume from the down-VM.

govint commented 8 years ago

Closed via #573

vmware-archive / vsphere-storage-for-docker

VMs with orphaned attachments won't boot if attachment is already on new VM #515