tethysplatform / tethys

The Tethys Platform main Django website project repository.
http://tethysplatform.org/
BSD 2-Clause "Simplified" License
94 stars 51 forks source link

Fixed and pulled healthcheck into docker/liveness-probe.sh #1038

Closed gronka closed 7 months ago

gronka commented 7 months ago

Continuing this discussion: https://github.com/tethysplatform/tethys/discussions/1018

3 goals here:

  1. Kubernetes doesn't apply HEALTHCHECK from a Dockerfile, so I pulled the logic into a script called liveness-probe.sh which is copied into the container, and then can be implemented directly by any container manager (docker, docker-compose, kubernetes, etc).

  2. I don't get into the weeds of bash nor docker HEALTHCHECK often. I think this implementation is a bit easier to read and maintain.

  3. If I'm not making a mistake, the current implementation of HEALTHCHECK as executed by docker-compose fails to a syntax error:

            "Health": {
                "Status": "unhealthy",
                "FailingStreak": 5,
                "Log": [
                    {
                        "Start": "2024-04-17T16:17:10.194952618-05:00",
                        "End": "2024-04-17T16:17:10.236757633-05:00",
                        "ExitCode": 2,
                        "Output": "/bin/sh: 1: Syntax error: \"(\" unexpected\n"
                    },
                    {
    --
            "Healthcheck": {
                "Test": [
                    "CMD-SHELL",
                    "function check_process_is_running(){ if [ \"$(ps $1 | wc -l)\" -ne 2 ]; then echo The $2 process \\($1\\) is  not running. 1>&2; return 1; fi };   check_process_is_running $(cat $(grep 'pidfile=.*' /etc/supervisor/supervisord.conf | awk -F'=' '{print $2}' | awk '{print $1}')) supervisor;   check_process_is_running $(cat $(grep 'pid .*;' /etc/nginx/nginx.conf | awk '{print $2}' | awk -F';' '{print $1}')) nginx;   check_process_is_running $(ls -l /run/tethys_asgi0.sock.lock | awk -F'-> ' '{print $2}') asgi;"
                ],
                "StartPeriod": 240000000000
            },
            "Image": "tethysplatform/tethys-core",
coveralls commented 7 months ago

Coverage Status

coverage: 100.0%. remained the same when pulling c8323e723b8d300f7706984be1d6da5103e0b07c on gronka:liveness into 728e78fb337ef68ea89328797d2968cef46ec735 on tethysplatform:main.

gronka commented 7 months ago

@swainn I made the output more descriptive, and I added checks for cases that could create silent failures. Let me know if it suits you