willfarrell / docker-autoheal

Monitor and restart unhealthy docker containers.
MIT License
1.2k stars 218 forks source link

Dependent containers restart #49

Open jaroslawjanas opened 3 years ago

jaroslawjanas commented 3 years ago

Is it possible to restart containers that are dependent on the container that failed the health check? Say I have containers A and B that have C as a dependency. In simple terms A and B need a healthy C to function properly. Is it possible to restart A, B if C's health check is negative?

If not please consider it as a potential enhancement.

hasnat commented 3 years ago

if your container has dependencies, and requires service A to be functional. best if you broaden their health checks to include other containers? e.g. php-apache-application: healthcheck => check-can-access 120.0.1:80 && check-can-access mysql-db:3306 mysql-db: healthcheck => check-can-access 127.0.0.1:3306

where check-can-access is something like wait-for-it or simple telnet/netcatt check

roigreenberg commented 2 years ago

@jaroslawjanas have you managed to do it?

jaroslawjanas commented 2 years ago

@jaroslawjanas have you managed to do it?

No, instead I added a health check of my own in the docker-compose file.

  [REDACTED]:
    image: [REDACTED]
    container_name: [REDACTED]
    restart: unless-stopped
    labels:
      - autoheal=true
    healthcheck:
      test: ["CMD-SHELL", "curl --silent --output nul --show-error --fail https://[REDACTED].com && exit 0 || exit 1"]
      interval: 60s
      timeout: 30s
      retries: 5
      start_period: 100s

This fixed the problem I was struggling with.

roigreenberg commented 2 years ago

I thought about it too, I need to find the right health check in my case.. (I want to restart Selenium container if the container(s) that use it fail to start)

baroka commented 1 year ago

I overwrite entrypoint script with this one for adding support for two new "autoheal" labels: "master" and "slave".

When one of the containers is unhealthy, ALL of them are restarted. First, container with "master" label (should be only one) is restarted, and then, all the others with "slave" label.

I use it for openvpn container (master) and transmission and soulseek containers encrypted thru it (slaves).

"true" label remains with the same funcionality.

entry.sh.txt

baroka commented 2 weeks ago

Despite of no thanks to my post nor any like, seems to be people using this modify script. So I paste a working version of it. Isn't merged with latest autoheal version, but it's working with Alpine 3.18 and Docker 20.10.23.

Your Docker compose needs to be like this:

    entrypoint: /entry.sh # Adds feature: restart all containers (master first) on unhealthy one (master or slave)
    command: "autoheal"

entry.sh.txt

baroka commented 2 weeks ago

Finally I merged script with latest version.

My docker compose looks like:

  autoheal:
    container_name: autoheal
    image: willfarrell/autoheal:latest
    restart: unless-stopped
    networks:
      - socket_proxy
    security_opt:
      - no-new-privileges:true
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - $DOCKERDIR/autoheal/entry.sh:/entry.sh:ro
    entrypoint: /entry.sh # Adds feature: restart all containers (master first) on unhealthy one (master or slave)
    command: "autoheal"
    environment:
      - AUTOHEAL_INTERVAL=15
      - AUTOHEAL_RETRIES=40
      - AUTOHEAL_START_PERIOD=300
      - AUTOHEAL_DEFAULT_STOP_TIMEOUT=15
      - WEBHOOK_URL=https://api.telegram.org/bot$TELEGRAM_NOTIFIER_BOT_TOKEN/sendMessage
      - WEBHOOK_JSON_KEY=chat_id":"$TELEGRAM_NOTIFIER_CHAT_ID","text
      - DOCKER_SOCK=tcp://socket_proxy:2375

new_entry.sh.txt

jaroslawjanas commented 2 weeks ago

@baroka Thanks for your hard work.

Jorman commented 1 week ago

Without wishing to detract from the good work for this new entrypoint, which adds new features such as a telegram notification!!! I would like to point out that in the latest versions of docker-compose, I believe from 2.20 onwards, there seems to be a similar feature, I am testing it and it is too early to express opinions on it.

Seems that is possible to restart the whole stack, at least seems to, my setup for the stack is

services:
    my-master-service:
        image: ...
        container_name: my_1st_container_name
        ...
        healthcheck:
            test: "ping -c 1 www.google.com || exit 1"
            interval: 60s
            timeout: 5s
            retries: 3
        restart: unless-stopped

    my-1st-slave-service:
        image: ...
        container_name: my_2nd_container_name
        ...
        network_mode: "service:my-master-service"
        depends_on:
          my-master-service:
            condition: service_started
            restart: true
        healthcheck:
            test: "curl --fail http://localhost:my_2nd_container_service_port || exit 1"
            interval: 30s
            timeout: 10s
            retries: 5
        restart: unless-stopped

    my-2nd-slave-service:
        image: ...
        container_name: my_3rd_container_name
        ...
        network_mode: "service:my-master-service"
        depends_on:
          my-master-service:
            condition: service_started
            restart: true
        healthcheck:
            test: "curl --fail http://localhost:my_3rd_container_service_port || exit 1"
            interval: 30s
            timeout: 10s
            retries: 5
        restart: unless-stopped

Like I said I need more testing, so for now better to use the modified entrypoint!!!