Closed justinclayton closed 8 years ago
@justinclayton - thank for the detailed report !
Related to #92 (which is low priority since Docker fixed unmount in container 'rm -f').
@pdhamdhere - suggested fix is to listen to VM power events and auto-detach all in dockvols
on power off.
From the description it looks like the bug happened in step 4 of the repro. If the volume was in use by VM1 when it was shutdown by force then the volume plugin should have disallowed vm2 from attaching the volume.
The correct way should have been to,
Figure that the volume was attached to another VM
Make a check that the other VM is on and if not then
remove the volume from that VM (VM1) configuration and
attach the volume to the requesting VM (VM2)
Its wrong to have allowed the attach to VM2 when the volume is already attached to a VM and we have no idea if thats using it or not.
I checked the provision for alarms and events in VC and apparently we can create a VM power-state alarm to run a a script "on the VC server" or run a method (from a list thats in VC). Its perhaps better for the plugin to figure that if a VM is asking to attach a volume then figure if the volume is attached to a live VM (which can be queried) and then proceed to detach the volume from the down-VM.
Closed via #573
When container cluster managers are in play, containers that die suddenly due to hardware failure will be rescheduled on another Docker host VM before the VM is ready for use again. Unfortunately, in this recovery scenario, the VM that was powered down never gets a chance to detach VMDKs that are in use as Docker volumes, leaving the VM with an essentially orphaned attachment that is no longer required, and actually prevents it from booting back up if that VMDK is already attached elsewhere.
My setup:
vm-1
andvm-2
on hostesx-1
vol1
on VMFS datastoredatastore1
Here's the repro:
# docker run -d -v vol1:/data busybox sh -c "while :; do date >> /data/dates.txt; sleep 1; done"
# docker run -d -v vol1:/data busybox sh -c "while :; do date >> /data/dates.txt; sleep 1; done"
(This would be the behavior of any container management system like Mesos Marathon or Docker Swarm.)The two workarounds to this currently are:
After employing one of the above steps you will be able to power on vm-1 successfully.