hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.31k stars 4.42k forks source link

Native docker HEALTHCHECK support #3182

Closed smagafurov closed 1 year ago

smagafurov commented 7 years ago

You really should realize native docker HEALTHCHECK support:

$ docker run --name=test -d \
    --health-cmd='stat /etc/passwd || exit 1' \
    --health-interval=2s \
    busybox sleep 1d
$ sleep 2; docker inspect --format='{{.State.Health.Status}}' test
healthy

https://docs.docker.com/engine/reference/builder/#healthcheck

https://github.com/moby/moby/blob/bfed05be0b9bfc04bf922d79dd8dc420e1e579e2/docs/reference/run.md#healthcheck

mterron commented 7 years ago

What about

--health-cmd="consul info | awk '/health_score/{if ($3 >=1) exit 1; else exit 0}'"
smagafurov commented 7 years ago

Hi, mterron! I want consul to check docker healtcheck (not vice versa). I want to configure healthcheck in docker, and ask consul to check this docker healthcheck. In conjunction with gliderlabs/registrator it will be killer feature (if/when registrator will support this feature too)

mterron commented 7 years ago

I'm not sure I understand the use case. Can you elaborate?

On 4/07/2017 6:26 am, "smagafurov" notifications@github.com wrote:

Hi, mterron! I want consul to check docker healtcheck (not vice versa). I want to configure healthcheck in docker, and ask consul to check this docker healthcheck. In conjunction with gliderlabs/registrator it will be killer feature (if/when registrator will support this feature too)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hashicorp/consul/issues/3182#issuecomment-312709349, or mute the thread https://github.com/notifications/unsubscribe-auth/AEOr250WE-zFdfrt5y93S0vreVddqp5Tks5sKTJVgaJpZM4ODlrp .

smagafurov commented 7 years ago

Use case:

Yes, we can repeat the same check script in consul "Docker + Interval" check. But this is redundant, and we violating the principle "do not repeat yourself".

It will be great if we can add consul check something like this:

{
"check": {
    "name": "Docker Heathcheck Status",
    "docker_container_identity": "(container-id|container-name|container-ip)",
    "interval": "10s"
  }
}

and consul just check native docker healthcheck status of specified container

Note about gliderlabs/registrator

In current version we can ask gliderlabs/registrator to register docker container as a service in consul using environment variable this way (this is a cut from Dockerfile):

EXPOSE 8080

# instructions for gliderlabs/registrator
ENV SERVICE_CHECK_SCRIPT="nc $SERVICE_IP 8080 | grep OK || exit 2"
ENV SERVICE_CHECK_INTERVAL="10s"
ENV SERVICE_CHECK_TIMEOUT="10s"

# and now we have to repeat the same for docker healthcheck
HEALTHCHECK --interval=10s --timeout=10s \
  CMD nc `hostname` 8080 | grep OK || exit 1

So when consul support to check native docker healthcheck status, then gliderlabs/registrator can support this feature too something like this (cut from Dockerfile):

EXPOSE 8080

# instructions for gliderlabs/registrator (do not need it at all if it's default)
ENV SERVICE_HEALTHCHECK=true

# now there is no repeatition, healthcheck is configured only here
HEALTHCHECK --interval=10s --timeout=10s \
  CMD nc `hostname` 8080 | grep OK || exit 1
mterron commented 7 years ago

I see what you mean. I'd implement that as a TTL healthcheck, probably by Registrator. Registrator is already listening to events on Docker, so it could maintain the health status in Consul without any changes to the codebase from a Consul perspective.

You can always use something like this as a healthcheck in consul:

docker inspect --format='{{.State.Health.Status}}' [your-container-name-or-id] | grep healthy

Probably the Hashicorp guys will give you an official answer.

smagafurov commented 7 years ago

IMHO if you have Docker+Interval, you shoud have DockerNativeHealthcheck too. Maybe I wrong. But as user of consul+docker I was disappointed that I can't get this thing easy (from the box)

mterron commented 7 years ago

Your suggestion does exactly what I suggested in the previous message. Just define the Consul health check as: docker inspect --format='{{.State.Health.Status}}' [your-container-name-or-id] | grep healthy

The other (race free) option I suggested is a TTL check but that requires support from Registrator (see Registrator issue #578)

Consul is not listening to Docker events as Registrator does since it is not a Docker based solution.

smagafurov commented 7 years ago

ok, thanks

promorphus commented 7 years ago

Was wondering if you ever got this figured out, I'm attempting to do the same thing, but writing the check for consul in that way seems a little borked since consul is another docker container, and doesn't have the ability to 'docker inspect' anything.

smagafurov commented 7 years ago

@promorphus, see "Docker + Interval" here: https://www.consul.io/docs/agent/checks.html Consul already can use Docker Exec API.

So it could check container status too.

promorphus commented 7 years ago

@smagafurov Can you provide me with or point me toward an example of consul using the docker exec api? I'm specifically attempting to use it with registrator as an automatically registered check when the container comes up.

I'd LOVE to use docker + interval, but the issue remains that it still requires things to be run from inside the consul container and pointed at the target container. I don't really want to install things (i.e., a mysql binary or a redis-cli client) inside the consul container so that it can conduct those checks.

If I can somehow ask consul to just check the container's Status.Health == healthy, that'd be fantastic, but I'm having issues figuring out how to provide a) the container ID to the consul / registrator container, and b) writing the check itself.

smagafurov commented 7 years ago

@promorphus

Consul Dockerfile cut:

FROM consul:0.8.5

RUN apk add --no-cache docker bash sudo

COPY service-status.sh /usr/local/bin/
RUN chown consul:root /usr/local/bin/service-status.sh
RUN chmod 570 /usr/local/bin/service-status.sh
RUN echo "consul ALL=(root) NOPASSWD: /usr/local/bin/service-status.sh" >> /etc/sudoers

COPY service-healthcheck.sh /usr/local/bin/
RUN chown consul:root /usr/local/bin/service-healthcheck.sh
RUN chmod 570 /usr/local/bin/service-healthcheck.sh

service-status.sh

#!/bin/bash

if [ $# -ne 1 ]; then
    exit 1
fi

CONTAINER_IDENTITY="$1"

CONTAINER_ID="`docker inspect -f '{{.Id}} {{.Name}} {{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' $(docker ps -q) | grep -w "${CONTAINER_IDENTITY}" | cut -d' ' -f1`"

if [ $? -ne 0 ]; then
    >&2 echo "Fail to inspect docker containers"
    exit 1
fi

docker inspect --format='{{.State.Health.Status}}' ${CONTAINER_ID}

service-healthcheck.sh

#!/bin/bash

if [ $# -ne 1 ]; then
    >&2 echo "Invalid parameters: '$@'"
    echo "USAGE: service-healthcheck.sh <container-id|container-name|container-ip>"
    exit 2
fi

CONTAINER_IDENTITY="$1"

STATUS="`sudo service-status.sh ${CONTAINER_IDENTITY}`"

if [ $? -ne 0 ]; then
    >&2 echo "Fail to get service status from docker: container_identity=${CONTAINER_IDENTITY}"
    exit 2
fi

echo "${STATUS}"

if [ "${STATUS}" = "starting" ]; then
    exit 1
fi

if [ "${STATUS}" = "healthy" ]; then
    exit 0
fi

exit 2

Your service Dockerfile cut:

# Instructions for registrator
ENV SERVICE_CHECK_SCRIPT="service-healthcheck.sh \$SERVICE_IP"

# docker health check (where docker-healthcheck.sh is your check script)
COPY docker-healthcheck.sh /opt/
HEALTHCHECK --interval=5s --timeout=5s --retries=5 CMD ["/bin/bash", "/opt/docker-healthcheck.sh"]

docker-compose.yml cut

  consul:
    ...
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    ...
  registrator:
    ...
    volumes:
      - /var/run/docker.sock:/tmp/docker.sock
    depends_on:
      - consul
    ...
CpuID commented 6 years ago

Does this technically belong in https://github.com/gliderlabs/registrator as it stands? Otherwise this would involve adding full Docker state visibility to Consul, which I suspect justifies a GH issue of it's own? :)

larssb commented 4 years ago

What is the current status on this issue? It's been a long time running and I'm trying to figure out how-to use Docker container HEALTHCHECK's

david-yu commented 1 year ago

Hi everyone. This issue has been open a while and so wanted to provide a response. We don't have current plans to implement additional health check options with the Consul agent. Our overall strategy is to start to leverage health check capability within other orchestrators such as K8s, Nomad, and ECS for container based workloads. There is a possibility to build support for this in consul-esm for syncing health check info to Consul so we would recommend opening an issue there if applicable.