vmware / vic

vSphere Integrated Containers Engine is a container runtime for vSphere.
http://vmware.github.io/vic
Other
639 stars 173 forks source link

Virtual Container Host goes into zombie status post ESX upgrade with new VCSA #4318

Open lgayatri opened 7 years ago

lgayatri commented 7 years ago

Virtual Container Host goes into zombie status post ESX upgrade with new VCSA

Details: Any details of what you want that might clarify for the developer how to approach the implementation.

  1. Installed VCSA and ESX with 6.5P01 builds
  2. Created 32 VCH along with 1 container VM each - all VCH are connected to DVPG of VDS
  3. Upgraded ESX host to latest build 6.5U1
  4. Installed a new VCSA
  5. Added this ESX to a latest VCSA with 6.5U1
  6. Created a new VDS and reconfigured the existing VCH to use the newer DVPG
  7. Powered on the VCH and saw all of them entering into "Reaped Zombie process PID ..." and getting hung without proceeding to boot.

Logs : As all VCHs are in hung state, but cannot provided vc-support from VC and vm-support from ESX host due to 10MB attachment size limitation. Uploaded the image. vch-hung

Acceptance Criteria: Post reconfiguration, VCH should have resumed boot as expected. VIC version:0.8

mdubya66 commented 7 years ago

@mhagen-vmware can we get the logs from this

lgayatri commented 7 years ago

hostd logs attached hostd.0.gz hostd.1.gz hostd.2.gz hostd.3.gz hostd.4.gz hostd.5.gz hostd.6.gz hostd.7.gz hostd.8.gz

hickeng commented 7 years ago

There are various issues here:

  1. having installed a new VCSA, the certificate thumbprint has changed from "54:C8:BB:40:F2:8B:05:4A:63:25:E0:9D:C8:D7:E1:0A:24:26:81:0E" to 7B:4A:AF:F7:19:B7:71:21:E8:4F:67:49:AA:30:F9:46:00:CC:58:F8
  2. the morefs have changed for the inventory objects (see https://github.com/vmware/vic/issues/3490)

The debug log for the appliance shows continual port-layer restarts which is as expected - we cannot confirm from the portlayer log as the VCH log bundles were not collected, but with an incorrect thumbprint it's never going to successfully initialize.