microsoft / vscode-dev-containers

NOTE: Most of the contents of this repository have been migrated to the new devcontainers GitHub org (https://github.com/devcontainers). See https://github.com/devcontainers/template-starter and https://github.com/devcontainers/feature-starter for information on creating your own!
https://aka.ms/vscode-remote
MIT License
4.72k stars 1.4k forks source link

Implement dev container health check inline with OCI spec. #786

Open PavelSosin-320 opened 3 years ago

PavelSosin-320 commented 3 years ago

Please, implement Started dev container health check via the definition of Docker run & Docker composefile health-check options with convenient default values and provide a template for health-check script drop-in that ensures dev container health. Collect docker logs in the case and only in the case when health check fails. It will make dev container starting more deterministic and maintenance easy. This is standard practice for all "software machines" design.

2percentsilk commented 3 years ago

HI @PavelSosin-320,

Are you talking about the HEALTHCHECK directive?

We could add this to verify the contents of the image but nothing more since additional configuration happens with the devcontainer.json.

Can you provide a bit more detail on how you see this being used by consumers of the definitions? We don't want to do this on startup due to the overhead and potentially conflicts with what the user is doing once the container has started.

PavelSosin-320 commented 3 years ago

In the cases when OCI image contains "heavy" services, the OCI container startup time can be very long and unpredictable. For example, SSH server inside a container needs few seconds to start due to dependency from Host OS network stack readiness. The observation of Container status changes is very complex and not so reliable. So, OCI introduces HealthCheck that allows checking readiness of container using simple application checks: execution of short-running bash command that return 0 exit code if succeeded() or curl localhost to check that the port is opened. OCI runtime does it automatically according to Dockerfile or run command parameters repeating attempts until a positive response is received or max-time is expired. Upon receiving a positive response container gets the status "healthy", otherwise, "unhealthy". The Docker run command has no its own timeout. If something went wrong during the container startup, i.e. Entry_script execution container 'hangs" and never reaches the "running" state, Docker attach and exec commands may fail due to unclear reasons, only because the container is still starting. There are many approaches how to implement the healthcheck in the best way and they are discussed many times. () Min: test -d $Project-mount-point. Max: pgrep sshd - I expect sshd is running.

PavelSosin-320 commented 3 years ago

The second thought: I can recommend to users to run docker top periodically or in the cases when the container becomes non-responsive. Running ps inside the container is impossible. But keeping stalled containers on the shared engine with polluted bind volumes is a luxury.