Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
Instance is deployed in AWS EC2/Elastic Beanstalk using self rpm-packaged version with config
"retry_join": [
"consul.core.domain"
]
consul.core.domain is a round-robin DNS alias for all three consul servers. All records resolve to a working consul server
Consul info for both defunct (1.1.0) as working client (1.0.2)
Consul 1.1.0
Output on a non-working node:
[root@ip-172-22-54-171 ~]# /usr/local/bin/consul info
agent:
check_monitors = 0
check_ttls = 0
checks = 3
services = 3
build:
prerelease =
revision = 5174058f
version = 1.1.0
consul:
known_servers = 0
server = false
runtime:
arch = amd64
cpu_count = 1
goroutines = 38
max_procs = 1
os = linux
version = go1.10.1
serf_lan:
coordinate_resets = 0
encrypted = false
event_queue = 0
event_time = 1
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 1
members = 1
query_queue = 0
query_time = 1
[root@ip-172-22-54-171 ~]# /usr/local/bin/consul members
Node Address Status Type Build Protocol DC Segment
i-0e7dd7c7fc2acdf57.x.y.z.com 172.22.54.171:8301 alive client 1.1.0 2 dc1 <default>
Output of log:
==> Starting Consul agent...
==> Consul agent running!
Version: 'v1.1.0'
Node ID: 'ec98a7a9-50ac-8c82-43ba-7887ceed4d81'
Node name: 'i-0e7dd7c7fc2acdf57.x.y.z.com'
Datacenter: 'dc1' (Segment: '')
Server: false (Bootstrap: false)
Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, DNS: 8600)
Cluster Addr: 172.22.54.171 (LAN: 8301, WAN: 8302)
Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false
==> Log data will now stream in as it occurs:
2018/06/04 10:37:10 [WARN] agent: Node name "i-0e7dd7c7fc2acdf57.x.y.z.com" will not be discoverable v
ia DNS due to invalid characters. Valid characters include all alpha-numerics and dashes.
2018/06/04 10:37:10 [INFO] serf: EventMemberJoin: i-0e7dd7c7fc2acdf57.x.y.z.com 172.22.54.171
2018/06/04 10:37:10 [INFO] agent: Started DNS server 127.0.0.1:8600 (udp)
2018/06/04 10:37:10 [INFO] agent: Started DNS server 127.0.0.1:8600 (tcp)
2018/06/04 10:37:10 [INFO] agent: Started HTTP server on 127.0.0.1:8500 (tcp)
2018/06/04 10:37:10 [INFO] agent: started state syncer
2018/06/04 10:37:10 [WARN] manager: No servers available
2018/06/04 10:37:10 [ERR] agent: failed to sync remote state: No known Consul servers
2018/06/04 10:37:10 [INFO] agent: Caught signal: hangup
2018/06/04 10:37:10 [INFO] agent: Reloading configuration...
2018/06/04 10:37:10 [WARN] agent: Service name "node_exporter" will not be discoverable via DNS due to invalid character
s. Valid characters include all alpha-numerics and dashes.
2018/06/04 10:37:10 [WARN] agent: Service name "node_exporter" will not be discoverable via DNS due to invalid character
s. Valid characters include all alpha-numerics and dashes.
2018/06/04 10:37:12 [INFO] agent: Caught signal: hangup
2018/06/04 10:37:12 [INFO] agent: Reloading configuration...
...
2018/06/04 10:37:28 [WARN] manager: No servers available
2018/06/04 10:37:28 [ERR] agent: failed to sync remote state: No known Consul servers
2018/06/04 10:37:58 [WARN] manager: No servers available
2018/06/04 10:37:58 [ERR] agent: failed to sync remote state: No known Consul servers
2018/06/04 10:38:22 [WARN] manager: No servers available
2018/06/04 10:38:22 [ERR] agent: failed to sync remote state: No known Consul servers
2018/06/04 10:38:47 [WARN] manager: No servers available
2018/06/04 10:38:47 [ERR] agent: failed to sync remote state: No known Consul servers
2018/06/04 10:39:16 [WARN] manager: No servers available
2018/06/04 10:39:16 [ERR] agent: failed to sync remote state: No known Consul servers
2018/06/04 10:39:41 [WARN] manager: No servers available
2018/06/04 10:39:41 [ERR] agent: failed to sync remote state: No known Consul servers
2018/06/04 10:39:59 [WARN] manager: No servers available
2018/06/04 10:39:59 [ERR] agent: failed to sync remote state: No known Consul servers
2018/06/04 10:40:16 [WARN] manager: No servers available
After stopping and starting the agent it works fine:
==> Starting Consul agent...
==> Consul agent running!
Version: 'v1.1.0'
Node ID: 'ec98a7a9-50ac-8c82-43ba-7887ceed4d81'
Node name: 'i-0e7dd7c7fc2acdf57.x.y.z.com'
Datacenter: 'core' (Segment: '')
Server: false (Bootstrap: false)
Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, DNS: 8600)
Cluster Addr: 172.22.54.171 (LAN: 8301, WAN: 8302)
Encrypt: Gossip: true, TLS-Outgoing: false, TLS-Incoming: false
==> Log data will now stream in as it occurs:
2018/06/04 11:34:09 [WARN] agent: Node name "i-0e7dd7c7fc2acdf57.x.y.z.com" will not be discoverable via DNS due to invalid characters. Valid characters include all alpha-numerics and dashes.
2018/06/04 11:34:09 [INFO] serf: EventMemberJoin: i-0e7dd7c7fc2acdf57.x.y.z.com 172.22.54.171
2018/06/04 11:34:09 [WARN] agent: Service name "apache_exporter" will not be discoverable via DNS due to invalid characters. Valid characters include all alpha-numerics and dashes.
2018/06/04 11:34:09 [WARN] agent: Service name "mysqld_exporter" will not be discoverable via DNS due to invalid characters. Valid characters include all alpha-numerics and dashes.
2018/06/04 11:34:09 [WARN] agent: Service name "node_exporter" will not be discoverable via DNS due to invalid characters. Valid characters include all alpha-numerics and dashes.
2018/06/04 11:34:09 [INFO] agent: Started DNS server 127.0.0.1:8600 (udp)
2018/06/04 11:34:09 [INFO] agent: Started DNS server 127.0.0.1:8600 (tcp)
2018/06/04 11:34:09 [INFO] agent: Started HTTP server on 127.0.0.1:8500 (tcp)
2018/06/04 11:34:09 [INFO] agent: started state syncer
2018/06/04 11:34:09 [INFO] agent: Retry join LAN is supported for: aliyun aws azure digitalocean gce os scaleway softlayer triton
2018/06/04 11:34:09 [INFO] agent: Joining LAN cluster...
2018/06/04 11:34:09 [INFO] agent: (LAN) joining: [consul.core.domain]
2018/06/04 11:34:09 [WARN] manager: No servers available
2018/06/04 11:34:09 [ERR] agent: failed to sync remote state: No known Consul servers
2018/06/04 11:34:09 [INFO] serf: EventMemberJoin: xxxxx.x.y.z.com 172.22.121.180
2018/06/04 11:34:09 [INFO] serf: EventMemberJoin: yyyyy.x.y.z.com 172.22.52.19
Consul agent runs fine in version 1.0.2, so server issues don't seem to be related. Log output from a 1.0.2 version
==> Starting Consul agent...
==> Consul agent running!
Version: 'v1.0.2'
Node ID: '8c4e849a-44ad-bc92-b5bf-09befdda4522'
Node name: 'i-0d41ada3a164d6327.x.y.z.com'
Datacenter: 'core' (Segment: '')
Server: false (Bootstrap: false)
Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, DNS: 8600)
Cluster Addr: 172.22.54.103 (LAN: 8301, WAN: 8302)
Encrypt: Gossip: true, TLS-Outgoing: false, TLS-Incoming: false
==> Log data will now stream in as it occurs:
2018/06/04 11:40:22 [INFO] serf: EventMemberJoin: i-0d41ada3a164d6327.x.y.z.com 172.22.54.103
2018/06/04 11:40:22 [WARN] Service name "apache_exporter" will not be discoverable via DNS due to invalid characters. Valid characters include all alpha-numerics and dashes.
2018/06/04 11:40:22 [WARN] agent: check "service:apache_exporter" has the 'script' field, which has been deprecated and replaced with the 'args' field. See https://www.consul.io/docs/agent/checks.html
2018/06/04 11:40:22 [WARN] Service name "mysqld_exporter" will not be discoverable via DNS due to invalid characters. Valid characters include all alpha-numerics and dashes.
2018/06/04 11:40:22 [WARN] agent: check "service:mysqld_exporter" has the 'script' field, which has been deprecated and replaced with the 'args' field. See https://www.consul.io/docs/agent/checks.html
2018/06/04 11:40:22 [WARN] Service name "node_exporter" will not be discoverable via DNS due to invalid characters. Valid characters include all alpha-numerics and dashes.
2018/06/04 11:40:22 [WARN] agent: check "service:node_exporter" has the 'script' field, which has been deprecated and replaced with the 'args' field. See https://www.consul.io/docs/agent/checks.html
2018/06/04 11:40:22 [INFO] agent: Started DNS server 127.0.0.1:8600 (udp)
2018/06/04 11:40:22 [INFO] agent: Started DNS server 127.0.0.1:8600 (tcp)
2018/06/04 11:40:22 [INFO] agent: Started HTTP server on 127.0.0.1:8500 (tcp)
2018/06/04 11:40:22 [INFO] agent: started state syncer
2018/06/04 11:40:22 [INFO] agent: Retry join LAN is supported for: aliyun aws azure digitalocean gce os scaleway softlayer
2018/06/04 11:40:22 [INFO] agent: Joining LAN cluster...
2018/06/04 11:40:22 [INFO] agent: (LAN) joining: [consul.core.domain]
2018/06/04 11:40:22 [WARN] manager: No servers available
2018/06/04 11:40:22 [ERR] agent: failed to sync remote state: No known Consul servers
2018/06/04 11:40:22 [INFO] serf: EventMemberJoin: aaaaa.x.y.z.com 172.22.80.205
2018/06/04 11:40:22 [INFO] serf: EventMemberJoin: bbbbb.x.y.z.com 172.22.4.143
...
2018/06/04 11:40:22 [INFO] serf: EventMemberJoin: consulserver3.consul--server.infra.core.domain 172.22.4.199
2018/06/04 11:40:22 [INFO] consul: adding server consulserver3.consul--server.infra.core.domain (Addr: tcp/172.22.4.199:8300) (DC: core)
It doesn not seem to be a name resolution problem from initial startup, since version 1.1.0 joins nicely after breaking and fixing resolv.conf (consul data dir emptied)
Update: issue does not seem related to version 1.1.0 but rather to the early startup of consul. So version 1.0.2 has the same problem. Will close this issue for now.
Overview of the Issue
retry_join seems defunct since version 1.1.0
Reproduction Steps
Steps to reproduce this issue, eg:
Instance is deployed in AWS EC2/Elastic Beanstalk using self rpm-packaged version with config "retry_join": [ "consul.core.domain" ] consul.core.domain is a round-robin DNS alias for all three consul servers. All records resolve to a working consul server
Consul info for both defunct (1.1.0) as working client (1.0.2)
Consul 1.1.0
Output on a non-working node:
Output of log:
After stopping and starting the agent it works fine:
Consul info:
Consul 1.0.2
Consul agent runs fine in version 1.0.2, so server issues don't seem to be related. Log output from a 1.0.2 version
It doesn not seem to be a name resolution problem from initial startup, since version 1.1.0 joins nicely after breaking and fixing resolv.conf (consul data dir emptied)