toughIQ / docker-mariadb-cluster

Dockerized MariaDB Galera Cluster
GNU General Public License v2.0
97 stars 69 forks source link

Adding healtcheck in docker compose file caused the container to fail #4

Open yaakov-berkovitch opened 7 years ago

yaakov-berkovitch commented 7 years ago

Hi,

I'm running docker 1.13 experimental and use compose file to deploy mariadb in cluster using your image. The docker compose file looks like:

version: '3'
services:
    mariadb:
      deploy:
        replicas: 1
      image: ${DOCKER_REPOSITORY}openpaas/toughiq/mariadb-cluster:2.0
      networks:
        - default
      volumes:
        - /config/mariadb/maria.cnf:/etc/mysql/conf.d/maria.cnf
      healthcheck:
          test: ["CMD", "ls"]
          interval: 30s
          timeout: 1m30s
          retries: 5
      environment:
        - DB_SERVICE_NAME=mariadb
        - MYSQL_ROOT_PASSWORD=xxxx
        - MYSQL_DATABASE=yyyyy
        - MYSQL_USER=user1
        - MYSQL_PASSWORD=user1

If i put the healthcheck test, the mariadb container failed after 10 sec, and the last line of the log shows: /usr/local/bin/docker-entrypoint.sh: running /docker-entrypoint-initdb.d/init_cluster_conf.sh If I removed the healthcheck test, the mariadb container succeeds running. BTW, the healthcheck is a dummy check and no matter what i put in the command it causes the container to fail.

Do you have any idea about the reason of the crash ? It works me fine for other container (not mariadb).

Thanks

toughIQ commented 7 years ago

Hi @yaakov-berkovitch, I have an idea, but it has nothing to do with the healtcheck. I was playing around and found that compose version 3 does not work with swarm mode. So at least I saw this message after docker-compose up:

WARNING: Some services (mariadb) use the 'deploy' key, which will be ignored. Compose does not support deploy configuration - use `docker stack deploy` to deploy to a swarm.

This mariadb-cluster image was especially designed to work with swarm mode. When doing the init_cluster stuff it tries to get the IPs of the other nodes. getent hosts tasks.$DB_SERVICE_NAME

This check does not return anything or when doing it here

CLUSTER_MEMBERS=`getent hosts tasks.$DB_SERVICE_NAME|awk '{print $1}'|tr '\n' ','`

it fails with exit code 2

Why does it work sometime and sometimes not? Simple. Since you are working with docker-compose there is a containername created at each startup. This name consists of currentDirectory_ServiceName_NumberCurrentReplica. For the same service within a compose file this name does not change. At the first initial start the container fails and exits. But the container still exists. The next docker-compose up does not initialize a brand new container, but starts the previous one, since the name according to the naming schema of compose is identical. But this time the original docker-entrypoint.sh script from MariaDB finds an already installed database, since there already is a data directory present in the container. If this is the case the init scripts within /docker-entrypoint-initdb.d/* don't get executed and hence it cannot fail with this exit 2 error.

yaakov-berkovitch commented 7 years ago

Thanks your answer. I was probably not clear but I'm using the compose file together with the docker deploy --compose-file. So the deployment is made in the swarm mode. Also the exit code I got during deployment is also 2 and the issue is perhaps also related to getent. But I cannot figure why the healthcheck instruction interfere with the launching of the container.

4n70w4 commented 3 years ago

Same problem. I would like to make a health check that the server is running and fully replicated and ready to accept new requests.

4n70w4 commented 3 years ago

Workaround:

Use https://github.com/colinmollenhour/mariadb-galera-swarm with included work healtcheck.