vmware / vic-product

vSphere Integrated Containers enables VMware customers to deliver a production-ready container solution to their developers and DevOps teams.
https://vmware.github.io/vic-product/
Other
178 stars 93 forks source link

VIC Management randomly failed to start. It won't recover. #2578

Open vitaprimo opened 3 years ago

vitaprimo commented 3 years ago

Summary

After authentication, vSphere Integrated Containers Management fails to load with error {"message":"Service not found: https://127.0.0.1:8282/","statusCode":404,"documentKind":"com:vmware:xenon:common:ServiceErrorResponse","errorCode":-2147483648}.

Restarting individual services fails, systemctl reset-failed appears to work, systemctl is-system-running will show running but it's really not.

Details

Doing systemctl without arguments showed systemd-modules-load.serviceto be failing, then journalctl showed:

-- Unit systemd-modules-load.service has begun starting up.
Aug 24 19:15:33 blablah.tld kernel: rdrand_rng: Neither RDSEED nor RDRAND is available.
Aug 24 19:15:33 blablah.tld systemd-modules-load[1003399]: Failed to insert 'rdrand_rng': No such device
Aug 24 19:15:33 blablah.tld systemd[1]: systemd-modules-load.service: Main process exited, code=exited, status=1/FAILURE
Aug 24 19:15:33 blablah.tld systemd[1]: Failed to start Load Kernel Modules.

harbor.service, armiral.service, fileserver.service and vic-machine-server.service are all active though only fileserver.service doesn't have red in systemctl status <service>.service. The plugin in vSphere appears to be working except that it won't deploy VCHs, doing it from a random computer still works and the Appliance's config help on port 9443 still is showing up. So at least Re-Initializing the appliance is probably an option. I'm not a programmer so that's my best bet I guess.

Appliance is running on vSphere 7 which has been a hassle deploying as setting any of the options in the OVA will prevent the appliance from starting. This was fixed by manually adding the vApp Options in the VM's Configure tab. Before vSphere 7 there were no issues deploying the OVA.

I attached a few screenshots.

See Also

Screen_Shot_2020-08-24_at_13_52_05 Screen Shot 2020-08-24 at 13 42 22 Screen_Shot_2020-08-24_at_13_41_28

vitaprimo commented 3 years ago

Again, not a programmer here, but RDSEED and RDRAND sound like CPU related, I remember the last one from VPNs, so I killed the appliance and changed CPU/MMU Virtualization from Automatic to Hardware CPU and MMU booted the thing up, rushed to the portal and I got the same JSON with something about not yet ready. Eventually it redirected me to SSO and after authenticating it worked again.

Hopefully it doesn't freaks out later and I shorts circuit again. In the console I shortly saw the RDSEED and RDRAND not being present message again but it was gone from the teal screen this time around.