willfarrell / docker-autoheal

Monitor and restart unhealthy docker containers.
MIT License
1.35k stars 226 forks source link
docker

Docker Autoheal

Monitor and restart unhealthy docker containers. This functionality was proposed to be included with the addition of HEALTHCHECK, however didn't make the cut. This container is a stand-in till there is native support for --exit-on-unhealthy https://github.com/docker/docker/pull/22719.

Supported tags and Dockerfile links

How to use

1. Docker CLI

UNIX socket passthrough

docker run -d \
    --name autoheal \
    --restart=always \
    -e AUTOHEAL_CONTAINER_LABEL=all \
    -v /var/run/docker.sock:/var/run/docker.sock \
    willfarrell/autoheal

TCP socket

docker run -d \
    --name autoheal \
    --restart=always \
    -e AUTOHEAL_CONTAINER_LABEL=all \
    -e DOCKER_SOCK=tcp://$HOST:$PORT \
    -v /path/to/certs/:/certs/:ro \
    willfarrell/autoheal

TCP with mTLS (HTTPS)

docker run -d \
    --name autoheal \
    --restart=always \
    --tlscacert=/certs/ca.pem \
    --tlscert=/certs/client-cert.pem \
    --tlskey=/certs/client-key.pem \
    -e AUTOHEAL_CONTAINER_LABEL=all \
    -e DOCKER_HOST=tcp://$HOST:2376 \
    -e DOCKER_SOCK=tcps://$HOST:2376 \
    -e DOCKER_TLS_VERIFY=1 \
    -v /path/to/certs/:/certs/:ro \
    willfarrell/autoheal

The certificates and keys need these names and resides under /certs inside the container:

See https://docs.docker.com/engine/security/https/ for how to configure TCP with mTLS

Change Timezone

If you need the timezone to match the local machine, you can map the /etc/localtime into the container.

docker run ... -v /etc/localtime:/etc/localtime:ro

2. Use in your container image

Choose one of the three alternatives:

a) Apply the label autoheal=true to your container to have it watched;
b) Set ENV AUTOHEAL_CONTAINER_LABEL=all to watch all running containers;
c) Set ENV AUTOHEAL_CONTAINER_LABEL to existing container label that has the value true;

Note: You must apply HEALTHCHECK to your docker images first.
See https://docs.docker.com/engine/reference/builder/#healthcheck for details.

Docker Compose (example)

services:
  app:
    extends:
      file: ${PWD}/services.yml
      service: app
    labels:
      autoheal-app: true

  autoheal:
    deploy:
      replicas: 1
    environment:
      AUTOHEAL_CONTAINER_LABEL: autoheal-app
    image: willfarrell/autoheal:latest
    network_mode: none
    restart: always
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - /var/run/docker.sock:/var/run/docker.sock

Optional Container Labels

autoheal.stop.timeout=20 Per containers override for stop timeout seconds during restart

Environment Defaults

Variable Description
AUTOHEAL_CONTAINER_LABEL=autoheal set to existing label name that has the value true
AUTOHEAL_INTERVAL=5 check every 5 seconds
AUTOHEAL_START_PERIOD=0 wait 0 seconds before first health check
AUTOHEAL_DEFAULT_STOP_TIMEOUT=10 Docker waits max 10 seconds (the Docker default) for a container to stop before killing during restarts (container overridable via label, see below)
AUTOHEAL_ONLY_MONITOR_RUNNING=false All containers monitored by default. Set this to true to only monitor running containers. This will result in Paused contaners being ignored.
DOCKER_SOCK=/var/run/docker.sock Unix socket for curl requests to Docker API
CURL_TIMEOUT=30 --max-time seconds for curl requests to Docker API
WEBHOOK_URL="" post message to the webhook if a container was restarted (or restart failed)

Testing (building locally)

docker buildx build -t autoheal .

docker run -d \
    -e AUTOHEAL_CONTAINER_LABEL=all \
    -v /var/run/docker.sock:/var/run/docker.sock \
    autoheal