To support health check, there are a few different options available.

Using docker way Docker has one monitor running for one container. It will run docker exec periodically to execute user supplied health check commands and keep the latest several times result. And change container health status based on the command execution result.

VIC already had docker exec supported, so it's easy to have same mechanism with docker to support health check.

5367 mentioned this solution

Pros:

Easy to get back user command execution result, and keep the limited times execution result in memory
Easy to get consistent result with docker.
Could integrate with docker swarm naturally, to let swarm reschedule container if container is in unhealthy status.

Cons:

docker exec in VIC is not free. We'll need to establish connection to container if it's not already created, every time it's executed. The default configuration for health check is every 30s to execute once, and timeout in 30s. We already see CDE issue with 30s attach timeout in slow vSphere environment. If we add health check in this way, we're adding periodic vm reconfiguration task to vSphere for each container VM.
No integration with vSphere HA feature vSphere has no knowledge about application health status, and will not reboot container even container is unhealthy.

Health check inside of container docker exec has a long execution path, to improve that, we could run the health check inside of container. And get back the check result from PL while docker ps or docker inspect is called. Here suggests to run health check in toolbox directly. And then query back the health check result through toolbox, instead of serial port connection used for docker attach. Port layer will need some simple logic to help generate result while container is not running.

Pros:

Very lightweight health check process.
Easy to integrate with vSphere application monitor. Toolbox could integrate with vSphere application monitor based on this health check result.

Cons:

It is slightly different with docker cause docker is keeping the health check result in docker daemon. So the life time of health check could be different with container life time. But to achieve this, we'll need additional support in portlayer while container is not running.
Need to add more configuration for container, to pass health check information into container
Port layer will need to handle status of not running or not responding containers, this might end up with a similar health check process

Health check in portlayer through process manager Different with in container health check, we could run health check in portlayer, thorough govmomi API, ProcessManager.StartProgram to run the user configured script. This one will still run command inside of container, but is controlled in portlayer. Then the health check could have different life time with containers.

Pros:

Still lightweight health check process, without additional VM reconfiguration
No duplicated logic between portlayer and container like the second option

Cons:

No integration with vSphere application monitor feature, same to the first option.

Integrate with vSphere HA

406 mentioned this requirement. Right now, we have toolbox implemented heart beat with VM level's monitor.

in #406 @dougm mentioned that vSphere has been EOL'd from vSphere 6.0, but the function is still available in vSphere 6.5, and the latest vSphere document still described the application monitoring feature: https://docs.vmware.com/en/VMware-vSphere/6.5/vsphere-esxi-vcenter-server-65-availability-guide.pdf After calling specific application monitoring SDK, what we could achieve is similar to VM monitoring. While the container status turns to unhealthy, we can restart service from tether first, if that still does not help (failed several times), stop ping vSphere application monitoring SDK, and then the VM will be restarted by host.

There is one thing we need to figure out before vSphere application HA and docker health check. docker health check is meant to show unhealthy container status to orchestrator, e.g. swarm, so swarm can reschedule unhealthy container to other docker host. But if integrate with vSphere HA, most likely swarm is able not to see any unhealthy container, until vSphere HA failed to recover the services anyway.

The decision is to integrate with vSphere HA, and allow user to enable/disable it in vic-machine. So the integration will be in VCH level, not container level. The discussion is in planning issue https://github.com/vmware/vic-planning/issues/2

Another stretch goal is to integrate container application monitor result with vSphere Alarm, so admin could get alert if container turns to unhealthy.

vmware / vic

docker health check #6223

5367 mentioned this solution

406 mentioned this requirement. Right now, we have toolbox implemented heart beat with VM level's monitor.