Closed AndrewSav closed 6 years ago
Also depite that I put "LogLevel": "debug",
into /etc/docker-volume-vsphere.conf
the pugin starts with loglevel of info:
Starting plugin driver=vsphere log_level=info config="/etc/docker-volume-vsphere.conf" group=root
Note that when I change "LogPath" in the same file - it actually works, the log path changes, but not the log level for some reason.
Here is corresponding (well not really corresponding, but I trust they are the same except for date) part from server logs:
11/01/17 22:36:55 505081 [MainThread] [INFO ] Connecting to the local Service Instance as 'dcui'
11/01/17 22:36:55 505081 [MainThread] [INFO ] Started new thread : 81931120 with target <function execRequestThread at 0x6c233994> and args (18, 903899, '{"cmd":"get","details":{"Name":"myvolume"},"version":"2"}')
11/01/17 22:36:55 505081 [Thread-667] [INFO ] Auth DB /etc/vmware/vmdkops/auth-db is missing, allowing all access
11/01/17 22:36:55 505081 [Thread-667] [INFO ] Connecting to the local Service Instance as 'dcui'
11/01/17 22:36:56 505081 [<vm_name_replaced>-<datastore_name_replaced>._DEFAULT.myvolume] [INFO ] executeRequest 'get' completed with ret={u'Error': 'Volume myvolume not found (file: /vmfs/volumes/<datastore_name_replaced>/dockvols/_DEFAULT/myvolume.vmdk)'}
11/01/17 22:36:56 505081 [MainThread] [INFO ] Started new thread : 81931120 with target <function execRequestThread at 0x6c233994> and args (22, 903899, '{"cmd":"create","details":{"Name":"myvolume","Opts":{"fstype":"ext4","size":"10gb"}},"version":"2"}')
11/01/17 22:36:56 505081 [Thread-668] [INFO ] Auth DB /etc/vmware/vmdkops/auth-db is missing, allowing all access
11/01/17 22:36:56 505081 [<vm_name_replaced>-<datastore_name_replaced>._DEFAULT.myvolume] [INFO ] *** createVMDK: /vmfs/volumes/<datastore_name_replaced>/dockvols/_DEFAULT/myvolume.vmdk opts={u'size': u'10gb', u'fstype': u'ext4'} vm_name=<vm_name_replaced> vm_uuid=423261b0-c9c9-d407-0d4b-a54f5f747c4b tenant_uuid=11111111-1111-1111-1111-111111111111 datastore_url=/vmfs/volumes/542b6697-e3f022ab-b393-9cb6549b20b8
11/01/17 22:36:56 505081 [<vm_name_replaced>-<datastore_name_replaced>._DEFAULT.myvolume] [INFO ] No access control, skipping volumes tracing in auth DB
11/01/17 22:36:56 505081 [<vm_name_replaced>-<datastore_name_replaced>._DEFAULT.myvolume] [INFO ] executeRequest 'create' completed with ret=None
11/01/17 22:36:56 505081 [MainThread] [INFO ] Started new thread : 81931120 with target <function execRequestThread at 0x6c233994> and args (18, 903899, '{"cmd":"attach","details":{"Name":"myvolume"},"version":"2"}')
11/01/17 22:36:56 505081 [Thread-669] [INFO ] Auth DB /etc/vmware/vmdkops/auth-db is missing, allowing all access
11/01/17 22:36:56 505081 [<vm_name_replaced>-<datastore_name_replaced>._DEFAULT.myvolume] [INFO ] *** disk_attach: VMDK /vmfs/volumes/<datastore_name_replaced>/dockvols/_DEFAULT/myvolume.vmdk to VM '<vm_name_replaced>' , bios uuid = 423261b0-c9c9-d407-0d4b-a54f5f747c4b, VC uuid=503275c6-e840-0e8e-75f3-85c8e72defd8)
11/01/17 22:36:56 505081 [<vm_name_replaced>-<datastore_name_replaced>._DEFAULT.myvolume] [INFO ] Found vm name='<vm_name_replaced>'
11/01/17 22:36:57 505081 [<vm_name_replaced>-<datastore_name_replaced>._DEFAULT.myvolume] [INFO ] Attaching /vmfs/volumes/<datastore_name_replaced>/dockvols/_DEFAULT/myvolume.vmdk as independent_persistent
11/01/17 22:36:57 505081 [<vm_name_replaced>-<datastore_name_replaced>._DEFAULT.myvolume] [INFO ] Disk /vmfs/volumes/<datastore_name_replaced>/dockvols/_DEFAULT/myvolume.vmdk successfully attached. controller pci_slot_number=1216, disk_slot=0
11/01/17 22:36:57 505081 [<vm_name_replaced>-<datastore_name_replaced>._DEFAULT.myvolume] [INFO ] executeRequest 'attach' completed with ret={'ControllerPciSlotNumber': '1216', 'Unit': '0'}
11/01/17 22:36:57 505081 [MainThread] [INFO ] Started new thread : 81931120 with target <function execRequestThread at 0x6c233994> and args (22, 903899, '{"cmd":"detach","details":{"Name":"myvolume"},"version":"2"}')
11/01/17 22:36:57 505081 [Thread-670] [INFO ] Auth DB /etc/vmware/vmdkops/auth-db is missing, allowing all access
11/01/17 22:36:58 505081 [<vm_name_replaced>-<datastore_name_replaced>._DEFAULT.myvolume] [INFO ] *** disk_detach: VMDK /vmfs/volumes/<datastore_name_replaced>/dockvols/_DEFAULT/myvolume.vmdk to VM '<vm_name_replaced>' , bios uuid = 423261b0-c9c9-d407-0d4b-a54f5f747c4b, VC uuid=503275c6-e840-0e8e-75f3-85c8e72defd8)
11/01/17 22:36:58 505081 [<vm_name_replaced>-<datastore_name_replaced>._DEFAULT.myvolume] [INFO ] Found vm name='<vm_name_replaced>'
11/01/17 22:36:58 505081 [<vm_name_replaced>-<datastore_name_replaced>._DEFAULT.myvolume] [INFO ] Disk detached /vmfs/volumes/<datastore_name_replaced>/dockvols/_DEFAULT/myvolume.vmdk
11/01/17 22:36:58 505081 [<vm_name_replaced>-<datastore_name_replaced>._DEFAULT.myvolume] [INFO ] executeRequest 'detach' completed with ret=None
11/01/17 22:36:58 505081 [MainThread] [INFO ] Started new thread : 81931120 with target <function execRequestThread at 0x6c233994> and args (18, 903899, '{"cmd":"remove","details":{"Name":"myvolume"},"version":"2"}')
11/01/17 22:36:58 505081 [Thread-671] [INFO ] Auth DB /etc/vmware/vmdkops/auth-db is missing, allowing all access
11/01/17 22:36:58 505081 [<vm_name_replaced>-<datastore_name_replaced>._DEFAULT.myvolume] [INFO ] *** removeVMDK: /vmfs/volumes/<datastore_name_replaced>/dockvols/_DEFAULT/myvolume.vmdk
11/01/17 22:36:58 505081 [<vm_name_replaced>-<datastore_name_replaced>._DEFAULT.myvolume] [INFO ] *** cleanVMDK: /vmfs/volumes/<datastore_name_replaced>/dockvols/_DEFAULT/myvolume.vmdk
11/01/17 22:36:58 505081 [<vm_name_replaced>-<datastore_name_replaced>._DEFAULT.myvolume] [INFO ] executeRequest 'remove' completed with ret=None
Vmx file of the machine affected:
.encoding = "UTF-8"
config.version = "8"
virtualHW.version = "9"
vmci0.present = "TRUE"
floppy0.present = "FALSE"
numvcpus = "2"
memSize = "16384"
sched.cpu.units = "mhz"
sched.cpu.latencySensitivity = "normal"
powerType.suspend = "soft"
tools.upgrade.policy = "upgradeAtPowerCycle"
scsi0.virtualDev = "lsilogic"
scsi0.present = "TRUE"
ide1:0.startConnected = "FALSE"
ide1:0.deviceType = "atapi-cdrom"
ide1:0.fileName = "CD/DVD drive 0"
ide1:0.present = "TRUE"
scsi0:0.deviceType = "scsi-hardDisk"
scsi0:0.fileName = "<vm_name_replaced>.vmdk"
sched.scsi0:0.shares = "normal"
sched.scsi0:0.throughputCap = "off"
scsi0:0.present = "TRUE"
displayName = "<vm_name_replaced>"
annotation = "Docker ingress node built on 2017-10-31T04:40:23Z"
guestOS = "other"
toolScripts.afterPowerOn = "TRUE"
toolScripts.afterResume = "TRUE"
toolScripts.beforeSuspend = "TRUE"
toolScripts.beforePowerOff = "TRUE"
tools.syncTime = "TRUE"
tools.guest.desktop.autolock = "FALSE"
messageBus.tunnelEnabled = "FALSE"
uuid.bios = "42 32 61 b0 c9 c9 d4 07-0d 4b a5 4f 5f 74 7c 4b"
vc.uuid = "50 32 75 c6 e8 40 0e 8e-75 f3 85 c8 e7 2d ef d8"
pciBridge0.present = "TRUE"
svga.present = "TRUE"
pciBridge4.present = "TRUE"
pciBridge4.virtualDev = "pcieRootPort"
pciBridge4.functions = "8"
pciBridge5.present = "TRUE"
pciBridge5.virtualDev = "pcieRootPort"
pciBridge5.functions = "8"
pciBridge6.present = "TRUE"
pciBridge6.virtualDev = "pcieRootPort"
pciBridge6.functions = "8"
pciBridge7.present = "TRUE"
pciBridge7.virtualDev = "pcieRootPort"
pciBridge7.functions = "8"
hpet0.present = "true"
nvram = "<vm_name_replaced>.nvram"
virtualhw.productcompatibility = "hosted"
migrate.hostlog = "<vm_name_replaced>-1387a4da.hlog"
sched.cpu.min = "0"
sched.cpu.shares = "normal"
sched.mem.min = "0"
sched.mem.minSize = "0"
sched.mem.shares = "normal"
ethernet0.virtualDev = "vmxnet3"
ethernet0.networkName = "Admin Network"
ethernet0.addressType = "vpx"
ethernet0.generatedAddress = "00:50:56:b2:5a:d8"
ethernet0.uptCompatibility = "TRUE"
ethernet0.present = "TRUE"
ethernet1.virtualDev = "vmxnet3"
ethernet1.networkName = "Admin Network"
ethernet1.addressType = "vpx"
ethernet1.generatedAddress = "00:50:56:b2:38:b1"
ethernet1.uptCompatibility = "TRUE"
ethernet1.present = "TRUE"
ethernet2.virtualDev = "vmxnet3"
ethernet2.networkName = "Admin Network"
ethernet2.addressType = "vpx"
ethernet2.generatedAddress = "00:50:56:b2:4c:57"
ethernet2.uptCompatibility = "TRUE"
ethernet2.present = "TRUE"
ethernet3.virtualDev = "vmxnet3"
ethernet3.networkName = "Admin Network"
ethernet3.addressType = "vpx"
ethernet3.generatedAddress = "00:50:56:b2:0f:18"
ethernet3.uptCompatibility = "TRUE"
ethernet3.present = "TRUE"
ethernet4.virtualDev = "vmxnet3"
ethernet4.networkName = "Admin Network"
ethernet4.addressType = "vpx"
ethernet4.generatedAddress = "00:50:56:b2:09:c5"
ethernet4.uptCompatibility = "TRUE"
ethernet4.present = "TRUE"
sched.swap.derivedName = "/vmfs/volumes/542b6697-e3f022ab-b393-9cb6549b20b8/<vm_name_replaced>/<vm_name_replaced>-2cb3d608.vswp"
replay.supported = "FALSE"
replay.filename = ""
scsi0:0.redo = ""
pciBridge0.pciSlotNumber = "17"
pciBridge4.pciSlotNumber = "21"
pciBridge5.pciSlotNumber = "22"
pciBridge6.pciSlotNumber = "23"
pciBridge7.pciSlotNumber = "24"
scsi0.pciSlotNumber = "16"
ethernet0.pciSlotNumber = "160"
ethernet1.pciSlotNumber = "192"
ethernet2.pciSlotNumber = "224"
ethernet3.pciSlotNumber = "256"
ethernet4.pciSlotNumber = "1184"
vmci0.pciSlotNumber = "32"
vmci0.id = "1601469515"
monitor.phys_bits_used = "40"
vmotion.checkpointFBSize = "4194304"
cleanShutdown = "TRUE"
softPowerOff = "FALSE"
extendedConfigFile = "<vm_name_replaced>.vmxf"
scsi1.pciSlotNumber = "1216"
scsi1.virtualDev = "pvscsi"
scsi1.present = "TRUE"
config.readOnly = "FALSE"
uuid.location = "56 4d f2 67 9d db 93 b4-51 5d 88 b0 d4 a5 45 fa"
scsi1.sasWWID = "50 05 05 60 c9 c9 d5 00"
Not sure if this is relevant: https://kb.vmware.com/s/article/2047927?language=en_US
Okay, forget about me saying that it happened intermittently. I just re-build a VM from scratch and it's happening on the new VM. My suspicion that this is due to the fact that the machine has 5 network adapters, and thus the scsi slot get pushed to the secondary bus. The plugin then does not seem to be able to cope.
@AndrewSav can you say what OS and version of OS you're using in the VM. Let me try a local repro and get back to you.
@govint, thank you for this, the OS is Rancher OS 1.1.0.
Ok, the ESX host service thats attaching the volumes does return the PCI slot number as is and the guest uses that to figure the device to mount. In this case the device is on a secondary PCI bus and this needs to be fixed in the ESX service as only it can figure which bridge the device is attached on and return the slot number w.r.t that. I'll post a fix by tomorrow.
@AndrewSav, On the VM thats showing the issue can you please run ls /sys/bus/pci/slots and ls /sys/bus/pci/devices/0000:00:16.1 and post the results here.
FWIW, the files under /sys/bus/pci/slots already represent devices on the secondary PCI bus, but looks like the encoding done by vSphere isn't understood when a device is located under a non zero function of the PCI bridge. In Ubuntu and Photon OS we have PCIE bridges and the PVSCSI adapter shows up on function 0. Just want to check how the PVSCSI controller is showing up in rancherOS.
I've made the fix in the ESX service to handle PVSCSI controller on the secondary PCI buses but that also needs a change how the client plugin determines the device under the appropriate PCI device (bridge in this case). Will post a PR once the changes are complete.
@govint
[rancher@abigar ~]$ ls /sys/bus/pci/slots
160 162 164 166 192 194 196 198 224 226 228 230 256 258 260 262 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62
161 163 165 167 193 195 197 199 225 227 229 231 257 259 261 263 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63
[rancher@abigar ~]$ ls /sys/bus/pci/devices/0000:00:16.1
0000:00:16.1:pcie001 class device enable local_cpus numa_node rescan subsystem_device
0000:00:16.1:pcie004 config dma_mask_bits firmware_node modalias pci_bus reset subsystem_vendor
0000:0c:00.0 consistent_dma_mask_bits driver irq msi_bus power resource uevent
broken_parity_status d3cold_allowed driver_override local_cpulist msi_irqs remove subsystem vendor
[rancher@abigar ~]$
@AndrewSav, thanks for posting the data requested, I've made a PR of the fixes to allow figuring out the position of the device on the client. It should get reviewed and merged soon.
Closed via #1978
@govint thank you. When is it available via docker hub?
@AndrewSav It will be part of November month release .. we are shooting a release date (11/30) for vDVS 0.19.
I shall update this issue when vDVS 0.19 release available for your to try out.
@AndrewSav vDVS 0.19 release is out please try out deliverable from https://github.com/vmware/docker-volume-vsphere/releases/tag/0.19
@shuklanirdesh82 @govint Guys, I just tested this on the VM it used to have problems with and now it's working fine. Thank you for fixing this!
@AndrewSav Thanks for confirming the fix, this was indeed something that absolutely needed to be addressed, in the way we detect the device on the docker host.
Ocasionally a VM would get screwed up. All other VMs are working fine and you can create a volume from, them but from this specific VM you no longer can:
I ran
docker volume create --driver=vsphere --name=myvolume -o size=10gb
I'm getting:
Reboot does not help. Removing and re-installing the plugin does not help. I have to re-provision VM to get it working again.
Can some one help troubleshoot? From the excerpt above, I think that
1216
is wrong since there is no pci slot like this. Where does this 1216 come from?