hashicorp / docker-consul

Official Docker images for Consul.
Mozilla Public License 2.0
398 stars 238 forks source link

Consul Agent received Caught second signal terminated. Exiting #185

Open shweshi opened 2 years ago

shweshi commented 2 years ago

Overview of the Issue

Consul agent is getting a Second sigterm signal.

We are running consul agent using consul docker image. Docker image is build using the consul image. When container is stopped graceful exit is happening but we can see in the logs that a second sigterm is received by the consul agent. Not sure from where it's coming.

From the docker we are executing a shell script start.sh which runs consul agent command in background. In consul agent configuration we have enabled

leave_on_terminate = true
skip_leave_on_interrupt = false

for the graceful shutdown.

Since the consul agent command is running as a child process the sigterm won't be forwarded to consul agent process, so we have trapped the sigterm in start.sh and forward it to consul process.

graceful_shutdown() {
    echo "SIGTERM received"
    trap '' SIGTERM SIGINT # clear the trap

    consul_pid=$(ps -ef | grep "consul agent" | grep -v grep | awk '{print $1}')

    kill -SIGTERM ${consul_pid} # Sends SIGTERM to child/sub processes
    wait
    echo DONE
}

trap graceful_shutdown SIGTERM SIGINT

We are assuming that consul process will get the SIGTERM and perform a graceful shutdown.. In the logs however its logging a second SIGTERM which we are not sure from where its coming.

{"@level":"info","@message":"Caught second signal, Exiting","@module":"agent","@timestamp":"2022-05-05T18:08:44.327411Z","signal":15}

We even tried running docker consul image with docker compose and 3 nodes and faced the same issue. Snippet of docker-compose.yml

  consul-server-1:
    image: "consul:1.11.1"
    command: ["consul","agent","-config-file=/config/serverconfig.hcl","-retry-join=consul-server-1", "-bootstrap-expect=3", "-encrypt=xyz"]
    volumes:
      - "./consul-config:/config"
    container_name: consul-server-1
    networks:
      vpcbr:
        ipv4_address: 192.168.55.0
    ports:
      - "8500:8500"
      - "8600:8600/tcp"
      - "8600:8600/udp"

server config

leave_on_terminate = true
skip_leave_on_interrupt = false

If we ommit the leave_on_terminate and skip_leave_on_interrupt than second sigterm is not recieved, but graceful exit won't happen.

{"@level":"warn","@message":"serf: Shutdown without a Leave","@module":"agent.server.serf.lan","@timestamp":"2022-05-10T05:33:23.098134Z"}



#### Additional Info:
Consul Version: 1.11.1
Host OS: MacOS
Docker version 20.10.10, build b485636
shweshi commented 2 years ago

Found the issue. This issue can be fixed by https://github.com/hashicorp/consul/pull/13067