vmware / vic

vSphere Integrated Containers Engine is a container runtime for vSphere.
http://vmware.github.io/vic
Other
640 stars 173 forks source link

Container failed to start, blocked mounting the containerfs #6379

Open hickeng opened 7 years ago

hickeng commented 7 years ago

Build: https://ci.vcna.io/vmware/vic/13748 Test: 1-10-Docker-PS

@mhagen-vmware is this a nested environment? #2340 observed similar issues on the applianceVM and nested was tentatively identified as a probable culprit.

angry_murdock vmware.log file shows that we never moved passed opening the disk chain (EOF added by me for clarity):

2017-09-16T23:59:11.292Z| svga| I125: SVGA hiding SVGA
2017-09-16T23:59:11.295Z| svga| I125: SVGA enabling SVGA
2017-09-16T23:59:11.295Z| vcpu-0| I125: Ethernet0 MAC Address: 00:0c:29:4f:35:0a
2017-09-16T23:59:11.319Z| svga| A100: ConfigDB: Setting svga.guestBackedPrimaryAware = "TRUE"
2017-09-16T23:59:11.476Z| vcpu-0| I125: HBACommon: First write on scsi0:0.fileName='/vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c.vmdk'
2017-09-16T23:59:11.476Z| vcpu-0| I125: DDB: "longContentID" = "94acd9fdd658e6f9c8790f9a0f40b5e6" (was "70150403cac7ab65c16a49848374b5aa")
2017-09-16T23:59:11.516Z| vcpu-0| I125: DISKLIB-CHAIN : DiskChainUpdateContentID: old=0x8374b5aa, new=0xf40b5e6 (94acd9fdd658e6f9c8790f9a0f40b5e6)
<EOF>

and tether.debug shows we're waiting for it to succeed in mount:

tether.debug fragment

``` + echo 1 + echo 1 + echo 'Loading drivers' Loading drivers + drivers=("vmxnet3" "nfnetlink" "iptable_filter" "xt_conntrack" "nf_conntrack" "nf_conntrack_ipv4" "nf_defrag_ipv4" "ipt_REJECT" "nf_reject_ipv4" "nfsv3") + for i in '${drivers[@]}' + /usr/sbin/modprobe vmxnet3 + for i in '${drivers[@]}' + /usr/sbin/modprobe nfnetlink + for i in '${drivers[@]}' + /usr/sbin/modprobe iptable_filter + for i in '${drivers[@]}' + /usr/sbin/modprobe xt_conntrack + for i in '${drivers[@]}' + /usr/sbin/modprobe nf_conntrack + for i in '${drivers[@]}' + /usr/sbin/modprobe nf_conntrack_ipv4 [ 1.358356] ata_piix 0000:00:07.1: BMDMA: BAR4 is zero, falling back to PIO + for i in '${drivers[@]}' + /usr/sbin/modprobe nf_defrag_ipv4 + for i in '${drivers[@]}' + /usr/sbin/modprobe ipt_REJECT + for i in '${drivers[@]}' + /usr/sbin/modprobe nf_reject_ipv4 + for i in '${drivers[@]}' + /usr/sbin/modprobe nfsv3 + MOUNTPOINT=/mnt/containerfs + mkdir -p /mnt/containerfs + echo 'Waiting for rootfs' Waiting for rootfs + '[' '!' -e /dev/disk/by-label/containerfs ']' + sleep 0.1 + '[' '!' -e /dev/disk/by-label/containerfs ']' + mount -t ext4 /dev/disk/by-label/containerfs /mnt/containerfs ```

portlayer correctly identifies that angry_murdock failed to start (simply searching for " ERROR " got me straight to the problem):

portlayer fragment

``` Sep 16 2017 23:59:07.103Z DEBUG Found 2 subscribers to id: 368293 - vsphere.VMEvent: angry_murdock-e1511d0e1c7a on ns539602.ip-144-217-72.net in ha-datacenter is powered on Sep 16 2017 23:59:07.108Z DEBUG Event manager calling back to exec Sep 16 2017 23:59:07.108Z DEBUG Event manager calling back to logging Sep 16 2017 23:59:07.108Z DEBUG [BEGIN] [github.com/vmware/vic/lib/portlayer/logging.eventCallback:52] Sep 16 2017 23:59:07.108Z DEBUG [ END ] [github.com/vmware/vic/lib/portlayer/logging.eventCallback:52] [8.93µs] Sep 16 2017 23:59:07.108Z DEBUG [BEGIN] [github.com/vmware/vic/lib/portlayer/exec.(*Container).OnEvent:666] event PoweredOn received for id: 368293 Sep 16 2017 23:59:07.108Z DEBUG [BEGIN] [github.com/vmware/vic/lib/portlayer/exec.eventedState:640] event PoweredOn received for id: 368293 Sep 16 2017 23:59:07.108Z DEBUG [ END ] [github.com/vmware/vic/lib/portlayer/exec.eventedState:640] [4.79µs] event PoweredOn received for id: 368293 Sep 16 2017 23:59:07.108Z DEBUG [ END ] [github.com/vmware/vic/lib/portlayer/exec.(*Container).OnEvent:666] [20.281µs] event PoweredOn received for id: 368293 Sep 17 2017 00:02:07.095Z ERROR Property collector error: Post https://192.168.31.39/sdk: context deadline exceeded Sep 17 2017 00:02:07.095Z ERROR Unable to wait for extra config property guestinfo.vice..sessions|e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c.started: Post https://192.168.31.39/sdk: context deadline exceeded Sep 17 2017 00:02:07.095Z DEBUG [ END ] [github.com/vmware/vic/lib/portlayer/exec.(*containerBase).waitForSession:401] [3m0.003964974s] e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c Sep 17 2017 00:02:07.095Z DEBUG [ END ] [github.com/vmware/vic/lib/portlayer/exec.(*Container).start:367] [3m0.558395222s] e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c Sep 17 2017 00:02:07.095Z DEBUG Update: updated change version from handle: 2017-09-16T23:59:07.039101Z Sep 17 2017 00:02:07.095Z DEBUG [ END ] [github.com/vmware/vic/lib/portlayer/exec.Commit:35] [3m0.682731081s] e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c Sep 17 2017 00:02:07.095Z ERROR CommitHandler error on handle(45b60ee95a0e67b2629e014b9b0270b8) for e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c: &errors.errorString{s:"unable to wait for process launch status: Post https://192.168.31.39/sdk: context deadline exceeded"} ```

hostd.3 log fragment for this VM:

hostd log fragment

``` 2017-09-16T23:59:06.151Z info hostd[2DF40B70] [Originator@6876 sub=Vimsvc.TaskManager opID=39fe16dc user=droneci] Task Created : haTask-ha-folder-vm-vim.Folder.createVm-6682282 2017-09-16T23:59:06.151Z info hostd[2EF40B70] [Originator@6876 sub=Vmsvc opID=39fe16dc user=droneci] CreateVm: Path is not on object datastore ([datastore1] e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c.vmx) 2017-09-16T23:59:06.383Z info hostd[271E4B70] [Originator@6876 sub=Libs] SOCKET 16873 (75) 2017-09-16T23:59:06.383Z info hostd[271E4B70] [Originator@6876 sub=Libs] recv detected client closed connection 2017-09-16T23:59:06.383Z info hostd[271E4B70] [Originator@6876 sub=Libs] VigorTransportClientProcessError: Remote connection failure 2017-09-16T23:59:06.383Z info hostd[271E4B70] [Originator@6876 sub=Libs] VigorTransportClientDrainRecv: draining read. 2017-09-16T23:59:06.383Z info hostd[271E4B70] [Originator@6876 sub=Libs] SOCKET 16873 (75) 2017-09-16T23:59:06.383Z info hostd[271E4B70] [Originator@6876 sub=Libs] recv detected client closed connection 2017-09-16T23:59:06.383Z info hostd[271E4B70] [Originator@6876 sub=Libs] VigorTransportClientProcessError: closing connection. 2017-09-16T23:59:06.383Z info hostd[271E4B70] [Originator@6876 sub=Libs] VigorTransportClientManageConnection: connection closed. 2017-09-16T23:59:06.409Z info hostd[271E4B70] [Originator@6876 sub=Libs] CnxAuthdProtoSecureConnect: Unencrypted connection, skipping thumbprint exchange. 2017-09-16T23:59:06.411Z info hostd[271E4B70] [Originator@6876 sub=Libs] CnxConnectAuthd: Returning false because CnxAuthdProtoConnect failed 2017-09-16T23:59:06.411Z info hostd[271E4B70] [Originator@6876 sub=Libs] Cnx_Connect: Returning false because CnxConnectAuthd failed 2017-09-16T23:59:06.411Z info hostd[271E4B70] [Originator@6876 sub=vm:Cnx_Connect: Error message: There is no VMware process running for config file /vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/39b78f64368af56846560ba15e15660fdb011a7337b9c51031450c81a98d9462/39b78f64368af56846560ba15e15660fdb011a7337b9c51031450c81a98d9462.vmx] 2017-09-16T23:59:06.411Z warning hostd[271E4B70] [Originator@6876 sub=vm:VigorTransportClientManageConnection: Failed to re-connect to VM /vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/39b78f64368af56846560ba15e15660fdb011a7337b9c51031450c81a98d9462/39b78f64368af56846560ba15e15660fdb011a7337b9c51031450c81a98d9462.vmx] 2017-09-16T23:59:06.424Z info hostd[271E4B70] [Originator@6876 sub=Libs] VigorOnlineDisconnectCb: connection closed (is final). 2017-09-16T23:59:06.493Z info hostd[305C2B70] [Originator@6876 sub=Hostsvc] Lookupvm: Cartel ID not set for VM 26247 2017-09-16T23:59:06.493Z warning hostd[305C2B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/39b78f64368af56846560ba15e15660fdb011a7337b9c51031450c81a98d9462/39b78f64368af56846560ba15e15660fdb011a7337b9c51031450c81a98d9462.vmx] Unable to get resource settings for a powered on VM 2017-09-16T23:59:06.493Z info hostd[305C2B70] [Originator@6876 sub=Hostsvc] Lookupvm: Cartel ID not set for VM 26247 2017-09-16T23:59:06.501Z info hostd[305C2B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/39b78f64368af56846560ba15e15660fdb011a7337b9c51031450c81a98d9462/39b78f64368af56846560ba15e15660fdb011a7337b9c51031450c81a98d9462.vmx] Send config update invoked 2017-09-16T23:59:06.504Z info hostd[305C2B70] [Originator@6876 sub=Hostsvc] Lookupvm: Cartel ID not set for VM 26247 2017-09-16T23:59:06.504Z warning hostd[305C2B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/39b78f64368af56846560ba15e15660fdb011a7337b9c51031450c81a98d9462/39b78f64368af56846560ba15e15660fdb011a7337b9c51031450c81a98d9462.vmx] Unable to get resource settings for a powered on VM 2017-09-16T23:59:06.504Z info hostd[305C2B70] [Originator@6876 sub=Hostsvc] Lookupvm: Cartel ID not set for VM 26247 2017-09-16T23:59:06.513Z info hostd[2EF40B70] [Originator@6876 sub=Vimsvc.ha-eventmgr opID=39fe16dc user=droneci] Event 368286 : Creating angry_murdock-e1511d0e1c7a on host ns539602.ip-144-217-72.net in ha-datacenter 2017-09-16T23:59:06.516Z info hostd[305C2B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/39b78f64368af56846560ba15e15660fdb011a7337b9c51031450c81a98d9462/39b78f64368af56846560ba15e15660fdb011a7337b9c51031450c81a98d9462.vmx] State Transition (VM_STATE_ON -> VM_STATE_OFF) 2017-09-16T23:59:06.516Z warning hostd[305C2B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/39b78f64368af56846560ba15e15660fdb011a7337b9c51031450c81a98d9462/39b78f64368af56846560ba15e15660fdb011a7337b9c51031450c81a98d9462.vmx] Failed to find activation record, event user unknown. 2017-09-16T23:59:06.516Z info hostd[27F40B70] [Originator@6876 sub=Guestsvc.GuestFileTransferImpl] Entered VmPowerStateListener 2017-09-16T23:59:06.516Z info hostd[27F40B70] [Originator@6876 sub=Guestsvc.GuestFileTransferImpl] VmPowerStateListener succeeded 2017-09-16T23:59:06.516Z info hostd[27F40B70] [Originator@6876 sub=Hbrsvc] Replicator: powerstate change VM: 26247 Old: 1 New: 0 2017-09-16T23:59:06.516Z info hostd[27F40B70] [Originator@6876 sub=Hbrsvc] Replicator: Poweroff for VM: (id=26247) 2017-09-16T23:59:06.516Z info hostd[305C2B70] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 368287 : elated_curran-39b78f64368a on ns539602.ip-144-217-72.net in ha-datacenter is powered off 2017-09-16T23:59:06.534Z info hostd[2818CB70] [Originator@6876 sub=Default opID=39fe16dc user=droneci] CreateDirectory: Calling _file->CreateDirectory with _file = [N7Vmacore6System8FileImplE:0x29c9ff60] 2017-09-16T23:59:06.534Z info hostd[2818CB70] [Originator@6876 sub=Default opID=39fe16dc user=droneci] CreateDirectory: Calling _file->CreateDirectory for /vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c 2017-09-16T23:59:06.599Z info hostd[2818CB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c.vmx opID=39fe16dc user=droneci] SetVmxVersion: switching environment browser to vmx version 'vmx-11' 2017-09-16T23:59:06.606Z error hostd[305C2B70] [Originator@6876 sub=Hbrsvc] Failed to retrieve VM config (id=26248) 2017-09-16T23:59:06.606Z error hostd[305C2B70] [Originator@6876 sub=Hbrsvc] Replicator: VmReconfig failed to retrieve replication config for VM 26248, ignoring: vim.fault.ReplicationVmConfigFault 2017-09-16T23:59:06.607Z info hostd[2818CB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c.vmx opID=39fe16dc user=droneci] Storage policy for disk '/vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c.vmdk' not specified. 2017-09-16T23:59:06.652Z info hostd[29EC1B70] [Originator@6876 sub=Vimsvc.TaskManager opID=39fe16ed user=droneci] Task Created : haTask-26247-vim.VirtualMachine.reconfigure-6682292 2017-09-16T23:59:06.653Z info hostd[2EF40B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/39b78f64368af56846560ba15e15660fdb011a7337b9c51031450c81a98d9462/39b78f64368af56846560ba15e15660fdb011a7337b9c51031450c81a98d9462.vmx opID=39fe16ed user=droneci] State Transition (VM_STATE_OFF -> VM_STATE_RECONFIGURING) 2017-09-16T23:59:06.826Z info hostd[305C2B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/39b78f64368af56846560ba15e15660fdb011a7337b9c51031450c81a98d9462/39b78f64368af56846560ba15e15660fdb011a7337b9c51031450c81a98d9462.vmx opID=39fe16ed user=droneci] State Transition (VM_STATE_RECONFIGURING -> VM_STATE_OFF) 2017-09-16T23:59:06.837Z info hostd[305C2B70] [Originator@6876 sub=Vimsvc.ha-eventmgr opID=39fe16ed user=droneci] Event 368288 : Reconfigured elated_curran-39b78f64368a on ns539602.ip-144-217-72.net in ha-datacenter 2017-09-16T23:59:06.837Z info hostd[305C2B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/39b78f64368af56846560ba15e15660fdb011a7337b9c51031450c81a98d9462/39b78f64368af56846560ba15e15660fdb011a7337b9c51031450c81a98d9462.vmx opID=39fe16ed user=droneci] Send config update invoked 2017-09-16T23:59:06.839Z info hostd[340C1B70] [Originator@6876 sub=Hostsvc.HostConfigSyncManagerImpl] /sbin/backup.sh succeeded 2017-09-16T23:59:06.852Z info hostd[305C2B70] [Originator@6876 sub=Vimsvc.TaskManager opID=39fe16ed user=droneci] Task Completed : haTask-26247-vim.VirtualMachine.reconfigure-6682292 Status success 2017-09-16T23:59:06.852Z info hostd[2DF40B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/39b78f64368af56846560ba15e15660fdb011a7337b9c51031450c81a98d9462/39b78f64368af56846560ba15e15660fdb011a7337b9c51031450c81a98d9462.vmx] Send config update invoked 2017-09-16T23:59:06.879Z info hostd[2818CB70] [Originator@6876 sub=Libs opID=39fe16dc user=droneci] DictionaryLoad: Cannot open file "/vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c.vmsd": No such file or directory. 2017-09-16T23:59:06.879Z error hostd[2818CB70] [Originator@6876 sub=Default opID=39fe16dc user=droneci] An error occurred while loading configuration "/vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c.vmsd",not all entries are being read. It is strongly encouraged that you manually inspect the file and fix any corruptions. 2017-09-16T23:59:06.994Z info hostd[2818CB70] [Originator@6876 sub=Vimsvc.ha-eventmgr opID=39fe16dc user=droneci] Event 368289 : Assigned new BIOS UUID (03b23979-7c71-5263-bfcf-c0fdf14f350a) to angry_murdock-e1511d0e1c7a on ns539602.ip-144-217-72.net in ha-datacenter 2017-09-16T23:59:07.054Z warning hostd[2818CB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c.vmx opID=39fe16dc user=droneci] CannotRetrieveCorefiles: VM is in an invalid state 2017-09-16T23:59:07.083Z info hostd[2818CB70] [Originator@6876 sub=Libs opID=39fe16dc user=droneci] Error getting extended config name from config file. 2017-09-16T23:59:07.083Z info hostd[2818CB70] [Originator@6876 sub=Libs opID=39fe16dc user=droneci] Error getting extended config file path. 2017-09-16T23:59:07.083Z info hostd[2818CB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c.vmx opID=39fe16dc user=droneci] Initial tools version: 7:guestToolsNotInstalled 2017-09-16T23:59:07.083Z info hostd[2818CB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c.vmx opID=39fe16dc user=droneci] State Transition (VM_STATE_INITIALIZING -> VM_STATE_OFF) 2017-09-16T23:59:07.083Z info hostd[2818CB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c.vmx opID=39fe16dc user=droneci] Initialized virtual machine. 2017-09-16T23:59:07.090Z info hostd[2818CB70] [Originator@6876 sub=Vimsvc.ha-eventmgr opID=39fe16dc user=droneci] Event 368290 : Created virtual machine angry_murdock-e1511d0e1c7a on ns539602.ip-144-217-72.net in ha-datacenter 2017-09-16T23:59:07.091Z info hostd[2818CB70] [Originator@6876 sub=Vimsvc.TaskManager opID=39fe16dc user=droneci] Task Completed : haTask-ha-folder-vm-vim.Folder.createVm-6682282 Status success 2017-09-16T23:59:07.245Z info hostd[35A81B70] [Originator@6876 sub=Vimsvc.TaskManager opID=39fe16ff user=droneci] Task Created : haTask-26248-vim.VirtualMachine.reconfigure-6682304 2017-09-16T23:59:07.246Z info hostd[340C1B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c.vmx opID=39fe16ff user=droneci] State Transition (VM_STATE_OFF -> VM_STATE_RECONFIGURING) 2017-09-16T23:59:07.322Z info hostd[340C1B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c.vmx opID=39fe16ff user=droneci] State Transition (VM_STATE_RECONFIGURING -> VM_STATE_OFF) 2017-09-16T23:59:07.329Z info hostd[340C1B70] [Originator@6876 sub=Vimsvc.ha-eventmgr opID=39fe16ff user=droneci] Event 368291 : Reconfigured angry_murdock-e1511d0e1c7a on ns539602.ip-144-217-72.net in ha-datacenter 2017-09-16T23:59:07.329Z info hostd[340C1B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c.vmx opID=39fe16ff user=droneci] Send config update invoked 2017-09-16T23:59:07.340Z info hostd[340C1B70] [Originator@6876 sub=Vimsvc.TaskManager opID=39fe16ff user=droneci] Task Completed : haTask-26248-vim.VirtualMachine.reconfigure-6682304 Status success 2017-09-16T23:59:07.340Z info hostd[2818CB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c.vmx] Send config update invoked 2017-09-16T23:59:07.356Z info hostd[27F40B70] [Originator@6876 sub=Vimsvc.TaskManager opID=39fe1708 user=droneci] Task Created : haTask-26248-vim.VirtualMachine.powerOn-6682311 2017-09-16T23:59:07.357Z info hostd[2818CB70] [Originator@6876 sub=Vimsvc.ha-eventmgr opID=39fe1708 user=droneci] Event 368292 : angry_murdock-e1511d0e1c7a on host ns539602.ip-144-217-72.net in ha-datacenter is starting 2017-09-16T23:59:07.357Z info hostd[2818CB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c.vmx opID=39fe1708 user=droneci] State Transition (VM_STATE_OFF -> VM_STATE_POWERING_ON) 2017-09-16T23:59:07.357Z info hostd[2818CB70] [Originator@6876 sub=Libs opID=39fe1708 user=droneci] SIOC: SIOC is notified not to start injector 2017-09-16T23:59:07.359Z info hostd[2818CB70] [Originator@6876 sub=Libs opID=39fe1708 user=droneci] Starting vmx as /bin/vmx 2017-09-16T23:59:07.359Z info hostd[2818CB70] [Originator@6876 sub=Libs opID=39fe1708 user=droneci] Starting vmx as /bin/vmx 2017-09-16T23:59:07.362Z info hostd[2818CB70] [Originator@6876 sub=vm:VigorExecVMXExCommon: VM /vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c.vmx opID=39fe1708 user=droneci] in directory /vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c 2017-09-16T23:59:07.362Z info hostd[2818CB70] [Originator@6876 sub=vm:VigorExecVMXExCommon: Exec()'ing /bin/vmx /vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c.vmx opID=39fe1708 user=droneci] 2017-09-16T23:59:07.390Z info hostd[2818CB70] [Originator@6876 sub=Libs opID=39fe1708 user=droneci] Vigor: VMKernel_ForkExec(/bin/vmx, detached=1): status=0 pid=4811150 2017-09-16T23:59:07.390Z info hostd[290C1B70] [Originator@6876 sub=Vigor.Vmsvc.vm:/vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c.vmx] Could not retrieve Bootstrap state: N3Vim5Fault12InvalidState9ExceptionE(vim.fault.InvalidState) 2017-09-16T23:59:07.890Z info hostd[35AC2B70] [Originator@6876 sub=Hostsvc] Decremented SIOC Injector Flag2 2017-09-16T23:59:07.892Z info hostd[35AC2B70] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 368293 : angry_murdock-e1511d0e1c7a on ns539602.ip-144-217-72.net in ha-datacenter is powered on 2017-09-16T23:59:07.892Z info hostd[35AC2B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c.vmx] State Transition (VM_STATE_POWERING_ON -> VM_STATE_ON) 2017-09-16T23:59:07.892Z info hostd[27F40B70] [Originator@6876 sub=Guestsvc.GuestFileTransferImpl] Entered VmPowerStateListener 2017-09-16T23:59:07.892Z info hostd[27F40B70] [Originator@6876 sub=Guestsvc.GuestFileTransferImpl] VmPowerStateListener succeeded 2017-09-16T23:59:07.892Z info hostd[27F40B70] [Originator@6876 sub=Hbrsvc] Replicator: powerstate change VM: 26248 Old: 0 New: 1 2017-09-16T23:59:07.893Z info hostd[35AC2B70] [Originator@6876 sub=Vimsvc.TaskManager] Task Completed : haTask-26248-vim.VirtualMachine.powerOn-6682311 Status success 2017-09-16T23:59:07.902Z info hostd[35AC2B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c.vmx] Send config update invoked 2017-09-16T23:59:07.941Z info hostd[35AC2B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c/e1511d0e1c7a06412c013bba0fbc580cd0b3dd0f7509291739de648e786f170c.vmx] Skip a duplicate transition to: VM_STATE_ON 2017-09-16T23:59:08.050Z info hostd[2DF40B70] [Originator@6876 sub=Hostsvc.DatastoreSystem] RefreshVdiskDatastores: Done refreshing datastores. [LikewiseGetDomainJoinInfo:355] QueryInformation(): ERROR_FILE_NOT_FOUND (2/0): 2017-09-16T23:59:13.976Z error hostd[305C2B70] [Originator@6876 sub=SoapAdapter.HTTPService] Failed to read request; stream: , >, error: N7Vmacore16TimeoutExceptionE(Operation timed out) 2017-09-16T23:59:13.976Z error hostd[2EF40B70] [Originator@6876 sub=SoapAdapter.HTTPService] Failed to read request; stream: , >, error: N7Vmacore16TimeoutExceptionE(Operation timed out) 2017-09-16T23:59:16.810Z info hostd[305C2B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/VCH-13748-8952/VCH-13748-8952.vmx] Send config update invoked 2017-09-16T23:59:18.978Z error hostd[305C2B70] [Originator@6876 sub=SoapAdapter.HTTPService] Failed to read request; stream: , >, error: N7Vmacore16TimeoutExceptionE(Operation timed out) 2017-09-16T23:59:20.270Z info hostd[305C2B70] [Originator@6876 sub=SysCommandPosix] ForkExec(/sbin/backup.sh) 4811162 2017-09-16T23:59:21.665Z info hostd[305C2B70] [Originator@6876 sub=Hostsvc.HostConfigSyncManagerImpl] /sbin/backup.sh succeeded 2017-09-16T23:59:35.380Z info hostd[2EF40B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/58a3a549-2925c61b-9e5f-0cc47a9c19a4/39b78f64368af56846560ba15e15660fdb011a7337b9c51031450c81a98d9462/39b78f64368af56846560ba15e15660fdb011a7337b9c51031450c81a98d9462.vmx] Turning off heartbeat checker 2017-09-16T23:59:38.991Z error hostd[27F40B70] [Originator@6876 sub=SoapAdapter.HTTPService] Failed to read request; stream: , >, error: N7Vmacore16TimeoutExceptionE(Operation timed out) [LikewiseGetDomainJoinInfo:355] QueryInformation(): ERROR_FILE_NOT_FOUND (2/0): [LikewiseGetDomainJoinInfo:355] QueryInformation(): ERROR_FILE_NOT_FOUND (2/0): 2017-09-17T00:02:07.910Z info hostd[35AC2B70] [Originator@6876 sub=Solo.Vmomi opID=39fe171c user=droneci] Activation [N5Vmomi10ActivationE:0x34b0db30] : Invoke done [waitForUpdatesEx] on [vmodl.query.PropertyCollector:session[527a7fa2-99c6-dea8-7bb7-3c7c948cae7f]52c36cbf-0d04-f9ad-7c65-fd3728d38424] 2017-09-17T00:02:07.910Z verbose hostd[35AC2B70] [Originator@6876 sub=Solo.Vmomi opID=39fe171c user=droneci] Arg version: --> "1" 2017-09-17T00:02:07.910Z verbose hostd[35AC2B70] [Originator@6876 sub=Solo.Vmomi opID=39fe171c user=droneci] Arg options: --> (null) 2017-09-17T00:02:07.910Z info hostd[35AC2B70] [Originator@6876 sub=Solo.Vmomi opID=39fe171c user=droneci] Throw vmodl.fault.RequestCanceled 2017-09-17T00:02:07.910Z info hostd[35AC2B70] [Originator@6876 sub=Solo.Vmomi opID=39fe171c user=droneci] Result: --> (vmodl.fault.RequestCanceled) { --> faultCause = (vmodl.MethodFault) null, --> msg = "" --> } 2017-09-17T00:02:08.120Z warning hostd[305C2B70] [Originator@6876 sub=HTTP server] UnimplementedRequestHandler: HTTP method POST not supported for URI /rest/com/vmware/cis/session ```

This could be #2340, or a variant, but rescan of scsi host will not help as this disk was attached prior to boot.

Estimate is for figuring out how to proceed with the issue and possibly what diagnostics could be added to move us forward, and who to work with on the ESX side. I am exporting the host logs now but don't know if we'll still have them for the time in question, or how big they will be. Contact me directly for followup.

update - 31/01/2018 opened bug2049007

hickeng commented 6 years ago

6825 captures much the same data, but with two distinct forms:

  1. as described in this issue - the disk fails to present but the init process is launched and we see a disklib-chain message in vmware.log
  2. the initial description in #6825 - this time we stall on SVGA initialization and there's no evidence of the init process launching at all.
matthewavery commented 6 years ago

also see #7144 as it may be related. it is in the same code path.

hickeng commented 6 years ago

CI: https://ci.vcna.io/vmware/vic/15880 Group: Group6-VIC-Machine Suite: 6-04-Create-Basic Test: Create VCH - custom base disk VCH: VCH-15880-6118

tether:

Waiting for rootfs
+ '[' '!' -e /dev/disk/by-label/containerfs ']'
+ mount -t ext4 /dev/disk/by-label/containerfs /mnt/containerfs<EOF>

vmware.log:

2018-01-27T21:38:16.325Z| svga| I125: SVGA hiding SVGA
2018-01-27T21:38:16.329Z| svga| I125: SVGA enabling SVGA
2018-01-27T21:38:16.351Z| svga| A100: ConfigDB: Setting svga.guestBackedPrimaryAware = "TRUE"
2018-01-27T21:38:16.410Z| vcpu-0| I125: HBACommon: First write on scsi0:0.fileName='/vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/1784ead7772f8524848c675e9b7b791d82615cd1acfc1303accc4b745c43ca7a/1784ead7772f8524848c675e9b7b791d82615cd1acfc1303accc4b745c43ca7a.vmdk'
2018-01-27T21:38:16.410Z| vcpu-0| I125: DDB: "longContentID" = "d127ed350f373297185c08389dfb499b" (was "9e76a96fe1566ecea817fda09d3b4e1d")
2018-01-27T21:38:16.449Z| vcpu-0| I125: DISKLIB-CHAIN : DiskChainUpdateContentID: old=0x9d3b4e1d, new=0x9dfb499b (d127ed350f373297185c08389dfb499b)
<EOF>

managed to grab the relevant hostd portion:

2018-01-27T22:40:15.204Z info hostd[FFD26B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f.vmx opID=302.27-7c61 user=droneci] State Transition (VM_STATE_OFF -> VM_STATE_RECONFIGURING)
2018-01-27T22:40:15.269Z info hostd[3EFEEB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f.vmx opID=302.27-7c61 user=droneci] State Transition (VM_STATE_RECONFIGURING -> VM_STATE_OFF)
2018-01-27T22:40:15.274Z info hostd[3EFEEB70] [Originator@6876 sub=Vimsvc.ha-eventmgr opID=302.27-7c61 user=droneci] Event 138461 : Reconfigured cocky_mccarthy-187f8a56ac7e on ns543159.ip-144-217-252.net in ha-datacenter
2018-01-27T22:40:15.274Z info hostd[3EFEEB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f.vmx opID=302.27-7c61 user=droneci] Send config update invoked
2018-01-27T22:40:15.286Z info hostd[3EFEEB70] [Originator@6876 sub=Vimsvc.TaskManager opID=302.27-7c61 user=droneci] Task Completed : haTask-6060-vim.VirtualMachine.reconfigure-2423052 Status success
2018-01-27T22:40:15.293Z info hostd[41480B70] [Originator@6876 sub=Vimsvc.TaskManager opID=302.27-7c69 user=droneci] Task Created : haTask-6060-vim.VirtualMachine.powerOn-2423058
2018-01-27T22:40:15.293Z info hostd[461C1B70] [Originator@6876 sub=Vimsvc.ha-eventmgr opID=302.27-7c69 user=droneci] Event 138462 : cocky_mccarthy-187f8a56ac7e on host ns543159.ip-144-217-252.net in ha-datacenter is starting
2018-01-27T22:40:15.293Z info hostd[461C1B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f.vmx opID=302.27-7c69 user=droneci] State Transition (VM_STATE_OFF -> VM_STATE_POWERING_ON)
2018-01-27T22:40:15.293Z info hostd[461C1B70] [Originator@6876 sub=Libs opID=302.27-7c69 user=droneci] SIOC: SIOC is notified not to start injector
2018-01-27T22:40:15.297Z info hostd[461C1B70] [Originator@6876 sub=Libs opID=302.27-7c69 user=droneci] Starting vmx as /bin/vmx
2018-01-27T22:40:15.297Z info hostd[461C1B70] [Originator@6876 sub=Libs opID=302.27-7c69 user=droneci] Starting vmx as /bin/vmx
2018-01-27T22:40:15.299Z info hostd[461C1B70] [Originator@6876 sub=vm:VigorExecVMXExCommon: VM /vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f.vmx opID=302.27-7c69 user=droneci] in directory /vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5
2018-01-27T22:40:15.299Z info hostd[461C1B70] [Originator@6876 sub=vm:VigorExecVMXExCommon: Exec()'ing /bin/vmx /vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f.vmx opID=302.27-7c69 user=droneci]
2018-01-27T22:40:15.315Z info hostd[461C1B70] [Originator@6876 sub=Libs opID=302.27-7c69 user=droneci] Vigor: VMKernel_ForkExec(/bin/vmx, detached=1): status=0 pid=4274255
2018-01-27T22:40:15.315Z info hostd[40E42B70] [Originator@6876 sub=Vigor.Vmsvc.vm:/vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f.vmx] Could not retrieve Bootstrap state: N3Vim5Fault12InvalidState9ExceptionE(vim.fault.InvalidState)
2018-01-27T22:40:15.742Z info hostd[3EFEEB70] [Originator@6876 sub=Hostsvc] Decremented SIOC Injector Flag2
2018-01-27T22:40:15.744Z info hostd[3EFEEB70] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 138463 : cocky_mccarthy-187f8a56ac7e on  ns543159.ip-144-217-252.net in ha-datacenter is powered on
2018-01-27T22:40:15.744Z info hostd[3EFEEB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f.vmx] State Transition (VM_STATE_POWERING_ON -> VM_STATE_ON)
2018-01-27T22:40:15.744Z info hostd[41480B70] [Originator@6876 sub=Guestsvc.GuestFileTransferImpl] Entered VmPowerStateListener
2018-01-27T22:40:15.744Z info hostd[41480B70] [Originator@6876 sub=Guestsvc.GuestFileTransferImpl] VmPowerStateListener succeeded
2018-01-27T22:40:15.744Z info hostd[41480B70] [Originator@6876 sub=Hbrsvc] Replicator: powerstate change VM: 6060 Old: 0 New: 1
2018-01-27T22:40:15.746Z info hostd[3EFEEB70] [Originator@6876 sub=Vimsvc.TaskManager] Task Completed : haTask-6060-vim.VirtualMachine.powerOn-2423058 Status success
2018-01-27T22:40:15.774Z info hostd[3EFEEB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f.vmx] Send config update invoked
2018-01-27T22:40:15.831Z info hostd[3EFEEB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f.vmx] Skip a duplicate transition to: VM_STATE_ON
2018-01-27T22:40:16.084Z info hostd[3EFEEB70] [Originator@6876 sub=Hostsvc.DatastoreSystem] RefreshVdiskDatastores: Done refreshing datastores.
2018-01-27T22:40:18.947Z error hostd[45CC1B70] [Originator@6876 sub=Solo.HttpSvc.HTTPService] Failed to read request; stream: <io_obj p:0x1f4634d4, h:-1, <TCP '0.0.0.0:0'>, <TCP '0.0.0.0:0'>>, error: N7Vmacore16TimeoutExceptionE(Operation timed out)
2018-01-27T22:40:18.947Z error hostd[40EC4B70] [Originator@6876 sub=SoapAdapter.HTTPService] Failed to read request; stream: <io_obj p:0x43d3868c, h:-1, <TCP '0.0.0.0:0'>, <TCP '0.0.0.0:0'>>, error: N7Vmacore16TimeoutExceptionE(Operation timed out)
2018-01-27T22:40:19.675Z info hostd[40EC4B70] [Originator@6876 sub=SysCommandPosix] ForkExec(/sbin/backup.sh)  4274264
2018-01-27T22:40:19.782Z info hostd[3EFEEB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f.vmx] Send config update invoked
2018-01-27T22:40:19.914Z info hostd[3EFEEB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f.vmx] Tools manifest version status changed from guestToolsNotInstalled to guestToolsUnmanaged, on install is TRUE
2018-01-27T22:40:19.939Z info hostd[3EFEEB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f.vmx] Send config update invoked
2018-01-27T22:40:19.997Z info hostd[3EFEEB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f.vmx] Send config update invoked
2018-01-27T22:40:20.051Z info hostd[3EFEEB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f.vmx] Tools manifest version status changed from guestToolsUnmanaged to guestToolsUnmanaged, on install is TRUE
2018-01-27T22:40:20.061Z info hostd[3EFEEB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f.vmx] Send config update invoked
2018-01-27T22:40:20.111Z info hostd[3EFEEB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f.vmx] Setting the tools properties cache.
2018-01-27T22:40:20.208Z info hostd[3EFEEB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f.vmx] Send config update invoked
2018-01-27T22:40:20.291Z info hostd[3EFEEB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f.vmx] Send config update invoked
2018-01-27T22:40:20.362Z info hostd[3EFEEB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f.vmx] Send config update invoked
2018-01-27T22:40:20.572Z info hostd[40E42B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f.vmx opID=302.32-7c97 user=droneci] State Transition (VM_STATE_ON -> VM_STATE_GUEST_OPERATION)
2018-01-27T22:40:20.588Z info hostd[40EC4B70] [Originator@6876 sub=Hostsvc.HostConfigSyncManagerImpl] /sbin/backup.sh succeeded
2018-01-27T22:40:20.598Z info hostd[40EC4B70] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 138464 : Guest operation Start Program performed on Virtual machine cocky_mccarthy-187f8a56ac7e.
2018-01-27T22:40:20.599Z info hostd[40EC4B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f.vmx] State Transition (VM_STATE_GUEST_OPERATION -> VM_STATE_ON)
2018-01-27T22:40:20.599Z info hostd[40E42B70] [Originator@6876 sub=Guestsvc.GuestFileTransferImpl] Entered VmPowerStateListener
2018-01-27T22:40:20.599Z info hostd[40E42B70] [Originator@6876 sub=Guestsvc.GuestFileTransferImpl] VmPowerStateListener succeeded
2018-01-27T22:40:20.600Z info hostd[40E42B70] [Originator@6876 sub=Hbrsvc] Replicator: powerstate change VM: 6060 Old: 1 New: 1
2018-01-27T22:40:20.623Z info hostd[40EC4B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f.vmx] Send config update invoked
2018-01-27T22:40:20.818Z info hostd[3FB40B70] [Originator@6876 sub=Libs] SOCKET 6942 (107)
2018-01-27T22:40:20.818Z info hostd[3FB40B70] [Originator@6876 sub=Libs] recv detected client closed connection
2018-01-27T22:40:20.818Z info hostd[3FB40B70] [Originator@6876 sub=Libs] VigorTransportClientProcessError: Remote connection failure
2018-01-27T22:40:20.818Z info hostd[3FB40B70] [Originator@6876 sub=Libs] VigorTransportClientDrainRecv: draining read.
2018-01-27T22:40:20.818Z info hostd[3FB40B70] [Originator@6876 sub=Libs] SOCKET 6942 (107)
2018-01-27T22:40:20.818Z info hostd[3FB40B70] [Originator@6876 sub=Libs] recv detected client closed connection
2018-01-27T22:40:20.819Z info hostd[3FB40B70] [Originator@6876 sub=Libs] VigorTransportClientProcessError: closing connection.
2018-01-27T22:40:20.819Z info hostd[3FB40B70] [Originator@6876 sub=Libs] VigorTransportClientManageConnection: connection closed.
2018-01-27T22:40:20.845Z info hostd[3FB40B70] [Originator@6876 sub=Libs] CnxAuthdProtoSecureConnect: Unencrypted connection, skipping thumbprint exchange.
2018-01-27T22:40:20.858Z info hostd[3FB40B70] [Originator@6876 sub=Libs] CnxConnectAuthd: Returning false because CnxAuthdProtoConnect failed
2018-01-27T22:40:20.858Z info hostd[3FB40B70] [Originator@6876 sub=Libs] Cnx_Connect: Returning false because CnxConnectAuthd failed
2018-01-27T22:40:20.858Z info hostd[3FB40B70] [Originator@6876 sub=vm:Cnx_Connect: Error message: There is no VMware process running for config file /vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f.vmx]
2018-01-27T22:40:20.858Z warning hostd[3FB40B70] [Originator@6876 sub=vm:VigorTransportClientManageConnection: Failed to re-connect to VM /vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f.vmx]
2018-01-27T22:40:20.861Z info hostd[3FB40B70] [Originator@6876 sub=Libs] VigorOnlineDisconnectCb: connection closed (is final).
2018-01-27T22:40:20.861Z error hostd[40EC4B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f.vmx] Failed to query ToolsInstallManager N3Vim5Fault20GenericVmConfigFault9ExceptionE(vim.fault.GenericVmConfigFault)
2018-01-27T22:40:20.861Z info hostd[40EC4B70] [Originator@6876 sub=Hostsvc] Lookupvm: Cartel ID not set for VM 6060
2018-01-27T22:40:20.861Z warning hostd[40EC4B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f.vmx] Unable to get resource settings for a powered on VM
2018-01-27T22:40:20.861Z info hostd[40EC4B70] [Originator@6876 sub=Hostsvc] Lookupvm: Cartel ID not set for VM 6060
2018-01-27T22:40:20.868Z info hostd[40EC4B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f.vmx] Send config update invoked
2018-01-27T22:40:20.872Z info hostd[40EC4B70] [Originator@6876 sub=Hostsvc] Lookupvm: Cartel ID not set for VM 6060
2018-01-27T22:40:20.872Z warning hostd[40EC4B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f.vmx] Unable to get resource settings for a powered on VM
2018-01-27T22:40:20.872Z info hostd[40EC4B70] [Originator@6876 sub=Hostsvc] Lookupvm: Cartel ID not set for VM 6060
2018-01-27T22:40:20.879Z info hostd[40EC4B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f.vmx] State Transition (VM_STATE_ON -> VM_STATE_OFF)
2018-01-27T22:40:20.879Z warning hostd[40EC4B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f.vmx] Failed to find activation record, event user unknown.
2018-01-27T22:40:20.879Z info hostd[FFD26B70] [Originator@6876 sub=Guestsvc.GuestFileTransferImpl] Entered VmPowerStateListener
2018-01-27T22:40:20.879Z info hostd[FFD26B70] [Originator@6876 sub=Guestsvc.GuestFileTransferImpl] VmPowerStateListener succeeded
2018-01-27T22:40:20.879Z info hostd[FFD26B70] [Originator@6876 sub=Hbrsvc] Replicator: powerstate change VM: 6060 Old: 1 New: 0
2018-01-27T22:40:20.879Z info hostd[FFD26B70] [Originator@6876 sub=Hbrsvc] Replicator: Poweroff for VM: (id=6060)
2018-01-27T22:40:20.880Z info hostd[40EC4B70] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 138465 : cocky_mccarthy-187f8a56ac7e on  ns543159.ip-144-217-252.net in ha-datacenter is powered off
2018-01-27T22:40:20.925Z info hostd[45C40B70] [Originator@6876 sub=Vimsvc.TaskManager opID=302.32-7ca7 user=droneci] Task Created : haTask-6060-vim.VirtualMachine.reconfigure-2423098
2018-01-27T22:40:20.926Z info hostd[40EC4B70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f.vmx opID=302.32-7ca7 user=droneci] State Transition (VM_STATE_OFF -> VM_STATE_RECONFIGURING)
2018-01-27T22:40:21.008Z info hostd[3EFEEB70] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/59976bc0-5b86fee7-acff-0cc47a9f99e0/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f/187f8a56ac7e5e2dabae3f9d77a05d3be77e792cce5f541a92a6a5b1f9ff3b2f.vmx opID=302.32-7ca7 user=droneci] State Transition (VM_STATE_RECONFIGURING -> VM_STATE_OFF)

Next steps:

  1. get hostd team to see if there's anything in that log snippet they can use to infer why the disk would present.
  2. add a watchdog fragment to isos/bootstrap/bootstrap that wraps the mount command and will dump dmesg logs to tether.debug if the mount doesn't complete in a timely manner. This will at least tell us if the kernel saw issues with the device. If we do this then the wrapper should also encompass waiting for the device to present and dump udevd logs as that's a related but different issue where dmesg is useful for both.
mhagen-vmware commented 6 years ago

https://ci.vcna.io/vmware/vic/15972

Waiting for rootfs
+ '[' '!' -e /dev/disk/by-label/containerfs ']'
+ sleep 0.1
[    1.465255] ata_piix 0000:00:07.1: BMDMA: BAR4 is zero, falling back to PIO
+ '[' '!' -e /dev/disk/by-label/containerfs ']'
+ mount -t ext4 /dev/disk/by-label/containerfs /mnt/containerfs
Jan 31 2018 01:39:46.204Z DEBUG [BEGIN] op=302.174 [vic/lib/portlayer/exec.(*containerBase).waitForSession:404] 4b6760db5b952545d6acf9116223ed29405f090419ac8f507c40e007bd836059
Jan 31 2018 01:42:46.205Z ERROR op=302.174: vic/pkg/trace.(*Operation).Err: WaitForSession error: context deadline exceeded
vic/lib/portlayer/exec.(*Container).start:411 WaitForSession

Test-Cases.Group1-Docker-Commands.1-17-Docker-Network-Connect-VCH-15972-1032-container-logs.zip

hickeng commented 6 years ago

opened bug2049007

hickeng commented 6 years ago

Variant seen while investigating bug2038914. This was a set of containers where the disk failed to present and we block waiting on the containerfs label. Collecting vSphere logs and will link those when done.

container logs

hickeng commented 6 years ago

Seen in vic_nightly_logs_2018-04-09 vsphere6.0: https://storage.cloud.google.com/vic-ci-logs/vic_nightly_logs_2018-04-09-20-32-56.zip?authuser=1

VM that failed to start: vigilant_euclid-b0c9ca680a84 on host 10.161.171.221

rogeliosanchez commented 6 years ago

Seen on Nightly 6.0 - 2018-04-16

VIC version: VIC version: v1.4.0-dev-0-b6c0988

5-25-OPS-User-Grant

vic-machine create grants ops-user-perms Run And Return Rc And Output docker %{VCH-PARAMS} run -d --net public ${busybox} screen shot 2018-04-18 at 1 42 09 pm

vic-machine vic-machine.log

VCH logs vch-0-9123.zip

Nightly Summary 60.zip

Link to logs https://storage.cloud.google.com/vic-ci-logs/vic_nightly_logs_2018-04-16-20-32-56.zip?authuser=1

More logs Suites.Group5-Functional-Tests.5-25-OPS-User-Grant-VCH-0-9123-container-logs.zip

rogeliosanchez commented 6 years ago

Seen on Nightly 6.0 - 2018-04-16

VIC version: VIC version: v1.4.0-dev-0-b6c0988

5-25-OPS-User-Grant

vic-machine create grants ops-user-perms Run And Return Rc And Output docker %{VCH-PARAMS} run -d --net public ${busybox} screen shot 2018-04-18 at 1 42 09 pm

vic-machine vic-machine.log

VCH logs vch-0-9123.zip

Nightly Summary 60.zip

Link to logs https://storage.cloud.google.com/vic-ci-logs/vic_nightly_logs_2018-04-16-20-32-56.zip?authuser=1

rogeliosanchez commented 6 years ago

Seen on Nightly 6.5 - 2018-04-20

VIC version: v1.5.0-dev-18481-023de08

5-22-NFS-Volume Docker Read and Remove File Error response from daemon: Server error from portlayer: unable to wait for process launch status: Post https://10.161.249.198/sdk: context deadline exceeded screen shot 2018-04-20 at 5 56 59 pm

Vic machine vic-machine.log

VCH logs vch-0-7437.zip

Nightly Summary 65.zip

Link to logs https://storage.cloud.google.com/vic-ci-logs/vic_nightly_logs_2018-04-20-01-04-51.zip?authuser=1

More logs Suites.Group5-Functional-Tests.5-22-NFS-Volume-VCH-0-7437-container-logs-before-reboot.zip Suites.Group5-Functional-Tests.5-22-NFS-Volume-VCH-0-7437-container-logs.zip

hickeng commented 6 years ago

If this occurs again then we must bump the priority of:

  1. conditional/aspect style triggering of log collect
  2. customized diagnostic collection on test failure

This is the only way to support dumping of full VM memory on test failure without overwhelming the infrastructure. This is for problems that are intermittent but present across multiple tests.

Duplicating this comment in #6825