hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.31k stars 4.42k forks source link

consul 1.2.0 leaves the cluster on terminate, even if leave_on_terminate is false #4356

Closed telmich closed 6 years ago

telmich commented 6 years ago

Overview of the Issue

Expected behaviour: node does not leave the cluster on terminate Observed behavour: nodes leaves the cluster on terminate

Reproduction Steps

Config:

root@server2:~# cat /etc/consul.d/config.json { "datacenter": "hack4glarus", "bind_addr": "2a0a:e5c0:2:5:42b0:34ff:fe6f:f9b2", "disable_remote_exec": true, "disable_update_check": true, "encrypt": "AQcfqHXpjcGT5HNxohnDTQ==", "data_dir": "/var/lib/consul", "log_level": "INFO", "rejoin_after_leave": true, "leave_on_terminate": false, "retry_join": ["consul1.hack4glarus.ungleich.cloud", "consul2.hack4glarus.ungleich.cloud", "consul3.hack4glarus.ungleich.cloud"], "server": false }

Show status: root@server3:~# consul members Node Address Status Type Build Protocol DC Segment consul1 [2a0a:e5c0:2:2:400:c8ff:fe68:beda]:8301 alive server 1.2.0 2 hack4glarus consul2 [2a0a:e5c0:2:2:400:c8ff:fe68:bedb]:8301 alive server 1.2.0 2 hack4glarus consul3 [2a0a:e5c0:2:2:400:c8ff:fe68:bedc]:8301 alive server 1.2.0 2 hack4glarus monitoring [2a0a:e5c0:2:2:400:c8ff:fe68:bed8]:8301 alive client 1.2.0 2 hack4glarus server1 [2a0a:e5c0:2:5:42b0:34ff:fe6f:f6f0]:8301 alive client 1.2.0 2 hack4glarus server2 [2a0a:e5c0:2:5:42b0:34ff:fe6f:f9b2]:8301 alive client 1.2.0 2 hack4glarus server3 [2a0a:e5c0:2:5:42b0:34ff:fe6f:e6f4]:8301 alive client 1.2.0 2 hack4glarus

Press ctrl-c.

root@server3:~# consul members Node Address Status Type Build Protocol DC Segment consul1 [2a0a:e5c0:2:2:400:c8ff:fe68:beda]:8301 alive server 1.2.0 2 hack4glarus consul2 [2a0a:e5c0:2:2:400:c8ff:fe68:bedb]:8301 alive server 1.2.0 2 hack4glarus consul3 [2a0a:e5c0:2:2:400:c8ff:fe68:bedc]:8301 alive server 1.2.0 2 hack4glarus monitoring [2a0a:e5c0:2:2:400:c8ff:fe68:bed8]:8301 alive client 1.2.0 2 hack4glarus server1 [2a0a:e5c0:2:5:42b0:34ff:fe6f:f6f0]:8301 alive client 1.2.0 2 hack4glarus server2 [2a0a:e5c0:2:5:42b0:34ff:fe6f:f9b2]:8301 left client 1.2.0 2 hack4glarus server3 [2a0a:e5c0:2:5:42b0:34ff:fe6f:e6f4]:8301 alive client 1.2.0 2 hack4glarus root@server3:~#

Consul info for both Client and Server


root@server3:~# consul info
agent:
    check_monitors = 0
    check_ttls = 0
    checks = 2
    services = 2
build:
    prerelease = 
    revision = 28141971
    version = 1.2.0
consul:
    known_servers = 3
    server = false
runtime:
    arch = amd64
    cpu_count = 2
    goroutines = 47
    max_procs = 2
    os = linux
    version = go1.10.1
serf_lan:
    coordinate_resets = 0
    encrypted = true
    event_queue = 0
    event_time = 85
    failed = 0
    health_score = 0
    intent_queue = 0
    left = 1
    member_time = 117
    members = 7
    query_queue = 0
    query_time = 1

Operating system and Environment details

Linux, Devuan

Log Fragments

2018/07/08 06:08:51 [INFO] consul: adding server consul1 (Addr: tcp/[2a0a:e5c0:2:2:400:c8ff:fe68:beda]:8300) (DC: hack4glarus) 
2018/07/08 06:08:51 [INFO] serf: EventMemberJoin: consul3 2a0a:e5c0:2:2:400:c8ff:fe68:bedc
2018/07/08 06:08:51 [INFO] serf: EventMemberJoin: monitoring 2a0a:e5c0:2:2:400:c8ff:fe68:bed8
2018/07/08 06:08:51 [INFO] consul: adding server consul2 (Addr: tcp/[2a0a:e5c0:2:2:400:c8ff:fe68:bedb]:8300) (DC: hack4glarus) 
2018/07/08 06:08:51 [INFO] consul: adding server consul3 (Addr: tcp/[2a0a:e5c0:2:2:400:c8ff:fe68:bedc]:8300) (DC: hack4glarus) 
2018/07/08 06:08:51 [INFO] serf: Re-joined to previously known node: consul3: [2a0a:e5c0:2:2:400:c8ff:fe68:bedc]:8301
2018/07/08 06:08:51 [INFO] agent: (LAN) joined: 3 Err: <nil>
2018/07/08 06:08:51 [INFO] agent: Join LAN completed. Synced with 3 initial agents
2018/07/08 06:08:53 [INFO] agent: Synced check "service:consul-exporter"
2018/07/08 06:08:53 [INFO] agent: Synced check "service:node-exporter"
2018/07/08 06:08:55 [INFO] agent: Synced check "service:node-exporter"
2018/07/08 06:08:59 [WARN] agent: Check "service:consul-exporter" HTTP request failed: Get http://localhost:9107/metrics: dial tcp [::1]:9107: connect: connection refused

^C 2018/07/08 06:09:06 [INFO] agent: Caught signal: interrupt 2018/07/08 06:09:06 [INFO] agent: Gracefully shutting down agent... 2018/07/08 06:09:06 [INFO] consul: client starting leave 2018/07/08 06:09:06 [INFO] serf: EventMemberLeave: server2 2a0a:e5c0:2:5:42b0:34ff:fe6f:f9b2 2018/07/08 06:09:09 [WARN] agent: Check "service:consul-exporter" HTTP request failed: Get http://localhost:9107/metrics: dial tcp [::1]:9107: connect: connection refused 2018/07/08 06:09:09 [INFO] agent: Graceful exit completed 2018/07/08 06:09:09 [INFO] agent: Requesting shutdown 2018/07/08 06:09:09 [INFO] consul: shutting down client 2018/07/08 06:09:09 [INFO] manager: shutting down 2018/07/08 06:09:09 [INFO] agent: consul client down 2018/07/08 06:09:09 [INFO] agent: shutdown complete 2018/07/08 06:09:09 [INFO] agent: Stopping DNS server 127.0.0.1:8600 (tcp) 2018/07/08 06:09:09 [INFO] agent: Stopping DNS server 127.0.0.1:8600 (udp) 2018/07/08 06:09:09 [INFO] agent: Stopping HTTP server 127.0.0.1:8500 (tcp) 2018/07/08 06:09:09 [INFO] agent: Waiting for endpoints to shut down 2018/07/08 06:09:09 [INFO] agent: Endpoints down 2018/07/08 06:09:09 [INFO] agent: Exit code: 0

MagnumOpus21 commented 6 years ago

Leave on terminate enables a Consul agent to leave the cluster gracefully when a SIGTERM signal is thrown to it. This is useful, when an instance (say on EC2) shuts down abruptly, causing the Consul agent to get the SIGTERM signal, which will force the agent to exit the cluster gracefully. Ctrl + C is SIGINT, which is also a mechanism which the agent recognizes, that it has to shutdown gracefully. TERM signal in the documentation is SIGTERM, which cannot be captured by a keyboard shortcut. kill command on the terminal is akin to simulating a SIGTERM. TLDR: When the flag is set to true, killing the agent will cause a graceful termination, if false it will exit abruptly. Consul Agent Lifecycle will detail in depth, about how the exit behaviour affects the cluster membership.