vmware / vic

vSphere Integrated Containers Engine is a container runtime for vSphere.
http://vmware.github.io/vic
Other
639 stars 173 forks source link

Exceeding number of running VCPUs limit makes ESXi/VCH unresponsive for long time #6469

Open caglar10ur opened 6 years ago

caglar10ur commented 6 years ago

Env. is a nested ESXi 6.5U1 running under Fusion 8. Creating 8 containers after a clean VCH installs causes ESXi to throw "number of running VCPUs limit exceeded" exception

[vagrant@devbox:/opt/go/src/github.com/vmware/vic(5621)]  for i in $(seq 8); do docker run -d --rm --name $i ubuntu sleep $i & done
[1] 29864
[2] 29865
[3] 29866
[4] 29867
[5] 29872
[6] 29873
[7] 29880
[8] 29881
319bcb0b625b263787c867539bae921f3c8c317c55095b7862dfd5c275457c62
93a035ce7334373240f3dbe5cc32ca134c517d7981bdd836f1d9c53c17f43958
18cb12e7465b4e12a883f99cc5dae569a19ce92e66eaff3dcc0e50e9ee0cb509
2c83f2af2eb4bf6972f7bad186f4792655863452765722456d206171b8f40028
e009c9941430130bcb36f43c8f1ea9506b854a154676566700ac0a0ddd874645
e4544548aa79da5bc2ab4199b625a001c1681ac3f75ec14da5a40aed9f2c492d
5f2e88ac175d75634ca26bc586990519e165d53cb4720c7d5ff5afa78632d73c
f37872a8dac22e410cea6217e4797c29538411b007fcc410f72ccbea3c3a8f05
docker: Error response from daemon: Server error from portlayer: Module 'MonitorLoop' power on failed..

[3]   Done                    docker run -d --rm --name $i ubuntu sleep $i

[vagrant@devbox:/opt/go/src/github.com/vmware/vic(5621)] docker ps
error during connect: Get http://192.168.109.225:2375/v1.26/containers/json: dial tcp 192.168.109.225:2375: getsockopt: no route to host

This renders ESXi unresponsive

[root@localhost:/vmfs/volumes/59309dc6-7ad22499-a67f-000c292b1348] cat 5f2e88ac175d75634ca26bc586990519e165d53cb4720c7d5ff5afa78632d73c/vmware.log
...
2017-09-27T18:22:27.838Z| vmx| I125: Msg_Post: Error
2017-09-27T18:22:27.838Z| vmx| I125: [vob.world.vmm.limit.init.max.vms.reached] Maximum virtual machines limit reached: 8 (32768 worlds).
2017-09-27T18:22:27.838Z| vmx| I125: [msg.vmmonVMK.creatVMFailed] Could not power on virtual machine: Number of running VCPUs limit exceeded.
2017-09-27T18:22:27.838Z| vmx| I125: [msg.monitorLoop.createVMFailed.vmk] Failed to power on VM.
2017-09-27T18:22:27.838Z| vmx| I125: ----------------------------------------
2017-09-27T18:22:27.840Z| vmx| I125: Module 'MonitorLoop' power on failed.
2017-09-27T18:22:27.840Z| vmx| I125: VMX_PowerOn: ModuleTable_PowerOn = 0
2017-09-27T18:22:27.840Z| vmx| I125: Vix: [71265 mainDispatch.c:4234]: VMAutomation_ReportPowerOpFinished: statevar=1, newAppState=1873, success=1 additionalError=0
2017-09-27T18:22:27.840Z| vmx| I125:
2017-09-27T18:22:27.840Z| vmx| I125+ Power on failure messages: Module 'MonitorLoop' power on failed.
2017-09-27T18:22:27.840Z| vmx| I125+ Maximum virtual machines limit reached: 8 (32768 worlds).
2017-09-27T18:22:27.840Z| vmx| I125+ Could not power on virtual machine: Number of running VCPUs limit exceeded.
2017-09-27T18:22:27.840Z| vmx| I125+ Failed to power on VM.
2017-09-27T18:22:27.840Z| vmx| I125+ Failed to start the virtual machine.
2017-09-27T18:22:27.840Z| vmx| I125:
2017-09-27T18:22:27.841Z| vmx| I125: Vix: [71265 mainDispatch.c:4234]: VMAutomation_ReportPowerOpFinished: statevar=0, newAppState=1870, success=1 additionalError=0
2017-09-27T18:22:27.841Z| vmx| I125: Transitioned vmx/execState/val to poweredOff
2017-09-27T18:22:27.841Z| vmx| I125: Vix: [71265 mainDispatch.c:4234]: VMAutomation_ReportPowerOpFinished: statevar=0, newAppState=1870, success=0 additionalError=0
2017-09-27T18:22:27.841Z| vmx| I125: Vix: [71265 mainDispatch.c:4273]: Error VIX_E_FAIL in VMAutomation_ReportPowerOpFinished(): Unknown error
2017-09-27T18:22:27.841Z| vmx| I125: Vix: [71265 mainDispatch.c:4234]: VMAutomation_ReportPowerOpFinished: statevar=0, newAppState=1870, success=1 additionalError=0
2017-09-27T18:22:27.841Z| vmx| I125: Transitioned vmx/execState/val to poweredOff
2017-09-27T18:22:27.866Z| vmx| I125: Vix: [71265 mainDispatch.c:834]: VMAutomation_LateShutdown()
2017-09-27T18:22:27.866Z| vmx| I125: Vix: [71265 mainDispatch.c:783]: VMAutomationCloseListenerSocket. Closing listener socket.
2017-09-27T18:22:27.867Z| vmx| I125: Flushing VMX VMDB connections
2017-09-27T18:22:27.867Z| vmx| I125: VigorTransport_ServerCloseClient: Closing transport 7868236250 (err = 0)
2017-09-27T18:22:27.867Z| vmx| I125: VigorTransport_ServerDestroy: server destroyed.
2017-09-27T18:22:27.868Z| vmx| I125: VMX exit (0).
2017-09-27T18:22:27.868Z| vmx| I125: AIOMGR-S : stat o=1 r=3 w=0 i=0 br=49152 bw=0
2017-09-27T18:22:27.868Z| vmx| I125: OBJLIB-LIB: ObjLib cleanup done.
2017-09-27T18:22:27.869Z| vmx| W115: VMX has left the building: 0.

And VCH comes back to life after 30 min

root@ [ ~ ]# grep -i -E " WARN | ERROR " /var/log/vic/port-layer.log
Sep 27 2017 18:21:07.437Z WARN  skipping converting container VM VirtualMachine:28: could not parse id
Sep 27 2017 18:22:27.285Z DEBUG vSphere Event Error message on 6-5f2e88ac175d on localhost.localdomain in ha-datacenter: Failed to power on VM. for eventID(678) ignored by the event collector
Sep 27 2017 18:22:27.285Z DEBUG vSphere Event Cannot power on 6-5f2e88ac175d on localhost.localdomain in ha-datacenter. A general system error occurred:  for eventID(679) ignored by the event collector
Sep 27 2017 18:22:27.291Z ERROR op=295.60 (delta:592.056736ms): unexpected fault on task retry: &types.GenericVmConfigFault{VmConfigFault:types.VmConfigFault{VimFault:types.VimFault{MethodFault:types.MethodFault{FaultCause:(*types.LocalizedMethodFault)(nil), FaultMessage:[]types.LocalizableMessage{types.LocalizableMessage{DynamicData:types.DynamicData{}, Key:"msg.moduletable.powerOnFailed", Arg:[]types.KeyAnyValue{types.KeyAnyValue{DynamicData:types.DynamicData{}, Key:"1", Value:"MonitorLoop"}}, Message:"Module 'MonitorLoop' power on failed. "}, types.LocalizableMessage{DynamicData:types.DynamicData{}, Key:"vob.world.vmm.limit.init.max.vms.reached", Arg:[]types.KeyAnyValue{types.KeyAnyValue{DynamicData:types.DynamicData{}, Key:"1", Value:"8"}, types.KeyAnyValue{DynamicData:types.DynamicData{}, Key:"2", Value:"32768"}}, Message:"Maximum virtual machines limit reached: 8 (32768 worlds)."}, types.LocalizableMessage{DynamicData:types.DynamicData{}, Key:"msg.vmmonVMK.creatVMFailed", Arg:[]types.KeyAnyValue{types.KeyAnyValue{DynamicData:types.DynamicData{}, Key:"1", Value:"msg.vmk.status.VMK_VCPU_LIMIT_EXCEEDED"}}, Message:"Could not power on virtual machine: Number of running VCPUs limit exceeded."}, types.LocalizableMessage{DynamicData:types.DynamicData{}, Key:"msg.monitorLoop.createVMFailed.vmk", Arg:[]types.KeyAnyValue(nil), Message:"Failed to power on VM."}, types.LocalizableMessage{DynamicData:types.DynamicData{}, Key:"msg.vmx.poweron.failed", Arg:[]types.KeyAnyValue(nil), Message:"Failed to start the virtual machine."}}}}}, Reason:"Module 'MonitorLoop' power on failed. "}
Sep 27 2017 18:22:27.445Z ERROR CommitHandler error on handle(15660573fbf0047421e3199e128064f5) for 5f2e88ac175d75634ca26bc586990519e165d53cb4720c7d5ff5afa78632d73c: Module 'MonitorLoop' power on failed.
Sep 27 2017 18:38:16.194Z ERROR CommitHandler error on handle(26ea7c59d37d2378562582ad24ea9d51) for 93a035ce7334373240f3dbe5cc32ca134c517d7981bdd836f1d9c53c17f43958: unable to wait for process launch status: Post https://192.168.109.220/sdk: context deadline exceeded
Sep 27 2017 18:38:16.243Z ERROR CommitHandler error on handle(d33559666574c06e2b3e6597d1507100) for 18cb12e7465b4e12a883f99cc5dae569a19ce92e66eaff3dcc0e50e9ee0cb509: unable to wait for process launch status: Post https://192.168.109.220/sdk: context deadline exceeded
Sep 27 2017 18:38:16.251Z ERROR CommitHandler error on handle(a865cdee94be7b3ad80f124387f507a2) for e009c9941430130bcb36f43c8f1ea9506b854a154676566700ac0a0ddd874645: unable to wait for process launch status: Post https://192.168.109.220/sdk: context deadline exceeded
Sep 27 2017 18:38:16.259Z ERROR CommitHandler error on handle(0249f84e87426c88e7d15f6d6321d867) for 319bcb0b625b263787c867539bae921f3c8c317c55095b7862dfd5c275457c62: unable to wait for process launch status: Post https://192.168.109.220/sdk: context deadline exceeded
Sep 27 2017 18:38:20.624Z ERROR CommitHandler error on handle(0e15bb540965062906280388d9ec0fcb) for e4544548aa79da5bc2ab4199b625a001c1681ac3f75ec14da5a40aed9f2c492d: unable to wait for process launch status: Post https://192.168.109.220/sdk: context deadline exceeded
Sep 27 2017 18:38:20.654Z ERROR CommitHandler error on handle(ce5e41b75df6f10d1174a117e2b65076) for 2c83f2af2eb4bf6972f7bad186f4792655863452765722456d206171b8f40028: unable to wait for process launch status: Post https://192.168.109.220/sdk: context deadline exceeded
Sep 27 2017 18:50:55.347Z WARN  session keepalive error: Post https://192.168.109.220/sdk: EOF
Sep 27 2017 18:52:58.839Z WARN  container and handle ChangeVersions do not match for 2c83f2af2eb4bf6972f7bad186f4792655863452765722456d206171b8f40028: 2017-09-27T18:52:59.264644Z != 2017-09-27T18:22:18.329194Z
Sep 27 2017 18:52:59.577Z WARN  container and handle ChangeVersions do not match for 18cb12e7465b4e12a883f99cc5dae569a19ce92e66eaff3dcc0e50e9ee0cb509: 2017-09-27T18:53:00.171518Z != 2017-09-27T18:22:18.193425Z
Sep 27 2017 18:53:00.748Z ERROR Update: unable to update container e009c9941430130bcb36f43c8f1ea9506b854a154676566700ac0a0ddd874645
Sep 27 2017 18:53:00.748Z ERROR refreshing container e009c9941430130bcb36f43c8f1ea9506b854a154676566700ac0a0ddd874645 failed: ServerFaultCode: The object 'vim.VirtualMachine:29' has already been deleted or has not been completely created
Sep 27 2017 18:53:00.748Z ERROR Container e009c9941430130bcb36f43c8f1ea9506b854a154676566700ac0a0ddd874645 not found - unable to UnbindContainer
Sep 27 2017 18:53:01.345Z ERROR Update: unable to update container 319bcb0b625b263787c867539bae921f3c8c317c55095b7862dfd5c275457c62
Sep 27 2017 18:53:01.345Z ERROR refreshing container 319bcb0b625b263787c867539bae921f3c8c317c55095b7862dfd5c275457c62 failed: ServerFaultCode: The object 'vim.VirtualMachine:30' has already been deleted or has not been completely created
Sep 27 2017 18:53:01.345Z ERROR Container 319bcb0b625b263787c867539bae921f3c8c317c55095b7862dfd5c275457c62 not found - unable to UnbindContainer
Sep 27 2017 18:53:06.282Z WARN  container and handle ChangeVersions do not match for 93a035ce7334373240f3dbe5cc32ca134c517d7981bdd836f1d9c53c17f43958: 2017-09-27T18:53:05.62821Z != 2017-09-27T18:22:18.435676Z
Sep 27 2017 18:53:07.652Z WARN  container and handle ChangeVersions do not match for e4544548aa79da5bc2ab4199b625a001c1681ac3f75ec14da5a40aed9f2c492d: 2017-09-27T18:53:07.195475Z != 2017-09-27T18:22:25.174278Z
root@ [ ~ ]#
caglar10ur commented 6 years ago

screen shot 2017-09-27 at 11 31 47 am

caglar10ur commented 6 years ago

container-logs.zip