Closed mohito83 closed 3 years ago
The only time DC1 members adds to the WAN gossip pool when i manually executed following command curl http://127.0.0.1:18500/v1/agent/join/172.31.7.189?wan=1
Hi @mohito83!
Generally, messages in the logs such as Failed to join 172.31.7.191: dial tcp 172.31.7.191:18301: getsockopt: connection refused
would indicate potential issues in the network could block communication. Since you have custom ports for DNS, HTTP, HTTPS, Serf_Lan, and Serf_wan, can you confirm that those ports are open in the firewalls under both DCs?
Hi @ChipV223
I checked the iptables rules and other firewalls there is no blocking on port 18301, 18302 across the datacenters. There could be some momentary packet drops but that shouldn't stop the consul in DC1 to join the WAN gossip pool.
Hi @mohito83,
My apologies for the delayed response. Based on the information available above, my understanding is that you have servers at:
The two join failures I can see are:
# DC1
* Failed to join 172.31.7.138: dial tcp 172.31.7.138:18301: getsockopt: connection refused
# DC2
* Failed to join 172.31.7.191: dial tcp 172.31.7.191:18301: getsockopt: connection refused
Is that the set of error messages / failed behaviors you were asking about?
Are you sure that the Consul server agent at 172.31.7.138 is running and reachable from DC1? And same for 172.31.7.191 from DC2?
If you are still experiencing this issue, especially with a more recent version of Consul, let us know. Until then, I'm going to mark this as closed because it's been inactive for so long and we can't take further action without more information.
consul version: 0.8.4
Overview of the Issue
We have 3 datacenters out of which members from only 2 forms the WAN gossip pool during the bootstrap stage.
Reproduction Steps
Steps to reproduce this issue, following command it being used to start the consul as server. DC1:
consul agent -server -bootstrap-expect 3 --data-dir /opt/data/drconsul --config-dir /opt/web-app/etc/drconsul -client 0.0.0.0 -bind 172.31.7.136 -retry-join 172.31.7.138 172.31.7.137 172.31.7.136 -retry-join-wan 172.31.7.112 172.31.7.191 172.31.7.190 172.31.7.189
DC2:
consul agent -server -bootstrap-expect 3 --data-dir /opt/data/drconsul --config-dir /opt/web-app/etc/drconsul -client 0.0.0.0 -bind 172.31.7.189 -retry-join 172.31.7.191 172.31.7.190 172.31.7.189 -retry-join-wan 172.31.7.137 172.31.7.138 172.31.7.136 172.31.7.112
DC3:
consul agent -server -bootstrap-expect 1 --data-dir /opt/data/drconsul --config-dir /opt/web-app/etc/drconsul -client 0.0.0.0 -bind 172.31.7.112 -retry-join 172.31.7.112 -retry-join-wan 172.31.7.189 172.31.7.190 172.31.7.191 172.31.7.136 172.31.7.137 172.31.7.138
Sample configs.json file from one of the member node
Consul info for both Client and Server
Client info
``` output from client 'consul info' command here ```Server info
``` agent: check_monitors = 5 check_ttls = 0 checks = 10 services = 6 build: prerelease = revision = f436077 version = 0.8.4 consul: bootstrap = false known_datacenters = 3 leader = false leader_addr = 172.31.7.138:18300 server = true raft: applied_index = 68208 commit_index = 68208 fsm_pending = 0 last_contact = 37.953293ms last_log_index = 68208 last_log_term = 57 last_snapshot_index = 65537 last_snapshot_term = 57 latest_configuration = [{Suffrage:Voter ID:172.31.7.138:18300 Address:172.31.7.138:18300} {Suffrage:Voter ID:172.31.7.137:18300 Address:172.31.7.137:18300} {Suffrage:Voter ID:172.31.7.136:18300 Address:172.31.7.136:18300}] latest_configuration_index = 1 num_peers = 2 protocol_version = 2 protocol_version_max = 3 protocol_version_min = 0 snapshot_version_max = 1 snapshot_version_min = 0 state = Follower term = 57 runtime: arch = amd64 cpu_count = 2 goroutines = 96 max_procs = 2 os = linux version = go1.8.3 serf_lan: coordinate_resets = 0 encrypted = false event_queue = 0 event_time = 7 failed = 0 health_score = 0 intent_queue = 0 left = 0 member_time = 23 members = 3 query_queue = 0 query_time = 1 serf_wan: coordinate_resets = 0 encrypted = false event_queue = 0 event_time = 1 failed = 0 health_score = 0 intent_queue = 0 left = 0 member_time = 23 members = 7 query_queue = 0 query_time = 1 ```Operating system and Environment details
Linux 3.10.62-ltsi
Log Fragments
DC2:
DC1: