hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.31k stars 4.42k forks source link

panic: runtime error: invalid memory address or nil pointer dereference #16475

Open ZzIris opened 1 year ago

ZzIris commented 1 year ago

I start the consul servcer on 3 node for a long time,and i register the same service to the 3 consul server and import a kv json file into the consul, then I close the leader node and 1 follower, and I just restart the follower , I find it panic, the log output :

==> Starting Consul agent... Version: '1.15.0' Build Date: '2023-02-24 01:39:35 +0000 UTC' Node ID: '3e21921f-911f-d8f1-f4fc-60f684adf1bd' Node name: 'R01-P02RD-FGW-008-NM' Datacenter: 'dc1' (Segment: '') Server: true (Bootstrap: false) Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: 8443, gRPC: 8502, gRPC-TLS: 1, DNS: 8600) Cluster Addr: 192.168.100.42 (LAN: 8301, WAN: 8302) Gossip Encryption: true Auto-Encrypt-TLS: true HTTPS TLS: Verify Incoming: false, Verify Outgoing: true, Min Version: TLSv1_2 gRPC TLS: Verify Incoming: false, Min Version: TLSv1_2 Internal RPC TLS: Verify Incoming: true, Verify Outgoing: true (Verify Hostname: true), Min Version: TLSv1_2

==> Log data will now stream in as it occurs:

2023-03-01T09:43:06.941+0800 [WARN] agent: skipping file /home/store/zz/consul/config/consul-agent-ca-key.pem, extension must be .hcl or .json, or config format must be set 2023-03-01T09:43:06.941+0800 [WARN] agent: skipping file /home/store/zz/consul/config/consul-agent-ca.pem, extension must be .hcl or .json, or config format must be set 2023-03-01T09:43:06.941+0800 [WARN] agent: skipping file /home/store/zz/consul/config/dc1-server-consul-2-key.pem, extension must be .hcl or .json, or config format must be set 2023-03-01T09:43:06.941+0800 [WARN] agent: skipping file /home/store/zz/consul/config/dc1-server-consul-2.pem, extension must be .hcl or .json, or config format must be set 2023-03-01T09:43:06.943+0800 [DEBUG] agent.grpc.balancer: switching server: target=consul://dc1.3e21921f-911f-d8f1-f4fc-60f684adf1bd/server.dc1 from= to= 2023-03-01T09:43:06.964+0800 [WARN] agent.auto_config: skipping file /home/store/zz/consul/config/consul-agent-ca-key.pem, extension must be .hcl or .json, or config format must be set 2023-03-01T09:43:06.964+0800 [WARN] agent.auto_config: skipping file /home/store/zz/consul/config/consul-agent-ca.pem, extension must be .hcl or .json, or config format must be set 2023-03-01T09:43:06.964+0800 [WARN] agent.auto_config: skipping file /home/store/zz/consul/config/dc1-server-consul-2-key.pem, extension must be .hcl or .json, or config format must be set 2023-03-01T09:43:06.964+0800 [WARN] agent.auto_config: skipping file /home/store/zz/consul/config/dc1-server-consul-2.pem, extension must be .hcl or .json, or config format must be set 2023-03-01T09:43:07.074+0800 [INFO] agent.server.raft: starting restore from snapshot: id=259-32768-1677565937880 last-index=32768 last-term=259 size-in-bytes=19072 2023-03-01T09:43:07.078+0800 [INFO] agent.server.raft: snapshot restore progress: id=259-32768-1677565937880 last-index=32768 last-term=259 size-in-bytes=19072 read-bytes=19072 percent-complete="100.00%" 2023-03-01T09:43:07.078+0800 [INFO] agent.server.raft: restored from snapshot: id=259-32768-1677565937880 last-index=32768 last-term=259 size-in-bytes=19072 2023-03-01T09:43:07.089+0800 [INFO] agent.server.raft: initial configuration: index=33622 servers="[{Suffrage:Voter ID:866bbd62-7fe3-94e6-86ea-2e62a6feca60 Address:192.168.100.41:8300} {Suffrage:Voter ID:3e21921f-911f-d8f1-f4fc-60f684ad 2023-03-01T09:43:07.089+0800 [INFO] agent.server.raft: entering follower state: follower="Node at 192.168.100.42:8300 [Follower]" leader-address= leader-id= 2023-03-01T09:43:07.090+0800 [INFO] agent.server.serf.wan: serf: EventMemberJoin: R01-P02RD-FGW-008-NM.dc1 192.168.100.42 2023-03-01T09:43:07.090+0800 [INFO] agent.server.serf.wan: serf: Attempting re-join to previously known node: R01-P02RD-FGW-007-NM.dc1: 192.168.100.41:8302 2023-03-01T09:43:07.091+0800 [DEBUG] agent.server.memberlist.wan: memberlist: Initiating push/pull sync with: R01-P02RD-FGW-007-NM.dc1 192.168.100.41:8302 2023-03-01T09:43:07.091+0800 [INFO] agent.server.serf.lan: serf: EventMemberJoin: R01-P02RD-FGW-008-NM 192.168.100.42 2023-03-01T09:43:07.091+0800 [INFO] agent.router: Initializing LAN area manager 2023-03-01T09:43:07.091+0800 [INFO] agent.server.serf.lan: serf: Attempting re-join to previously known node: R01-P02RD-FGW-007-NM: 192.168.100.41:8301 2023-03-01T09:43:07.092+0800 [DEBUG] agent.grpc.balancer: switching server: target=consul://dc1.3e21921f-911f-d8f1-f4fc-60f684adf1bd/server.dc1 from= to=dc1-192.168.100.42:8300 2023-03-01T09:43:07.092+0800 [DEBUG] agent.server.memberlist.lan: memberlist: Initiating push/pull sync with: R01-P02RD-FGW-007-NM 192.168.100.41:8301 2023-03-01T09:43:07.092+0800 [INFO] agent.server: Adding LAN server: server="R01-P02RD-FGW-008-NM (Addr: tcp/192.168.100.42:8300) (DC: dc1)" 2023-03-01T09:43:07.092+0800 [INFO] agent.server.autopilot: reconciliation now disabled 2023-03-01T09:43:07.092+0800 [INFO] agent.server: Handled event for server in area: event=member-join server=R01-P02RD-FGW-008-NM.dc1 area=wan 2023-03-01T09:43:07.092+0800 [INFO] agent.server.serf.wan: serf: EventMemberJoin: R01-P02RD-FGW-007-NM.dc1 192.168.100.41 2023-03-01T09:43:07.092+0800 [INFO] agent.server.serf.wan: serf: Re-joined to previously known node: R01-P02RD-FGW-007-NM.dc1: 192.168.100.41:8302 2023-03-01T09:43:07.092+0800 [INFO] agent.server: Handled event for server in area: event=member-join server=R01-P02RD-FGW-007-NM.dc1 area=wan 2023-03-01T09:43:07.093+0800 [INFO] agent.server.serf.lan: serf: EventMemberJoin: R01-P02RD-FGW-007-NM 192.168.100.41 2023-03-01T09:43:07.093+0800 [INFO] agent.server.serf.lan: serf: Re-joined to previously known node: R01-P02RD-FGW-007-NM: 192.168.100.41:8301 2023-03-01T09:43:07.093+0800 [INFO] agent.server: Adding LAN server: server="R01-P02RD-FGW-007-NM (Addr: tcp/192.168.100.41:8300) (DC: dc1)" 2023-03-01T09:43:07.097+0800 [DEBUG] agent.server.autopilot: autopilot is now running 2023-03-01T09:43:07.097+0800 [DEBUG] agent.server.autopilot: state update routine is now running 2023-03-01T09:43:07.097+0800 [INFO] agent.server.cert-manager: initialized server certificate management 2023-03-01T09:43:07.097+0800 [DEBUG] agent.hcp_manager: HCP manager starting 2023-03-01T09:43:07.098+0800 [DEBUG] agent: restored service definition from file: service=service22-id file=/home/store/zz/consul/data/services/44852e95ff1aac4938fd796862eca16dd8b20cbb785aeaad35a30b6eb44ba704 2023-03-01T09:43:07.098+0800 [DEBUG] agent: restored service definition from file: service=test-192.168.100.38-8866-JjHzWXRY file=/home/store/zz/consul/data/services/b07b34401a9cb162d8864ac72b68414b0adfc47a901c64e35406585a00afa6c0 2023-03-01T09:43:07.099+0800 [DEBUG] agent: restored health check from file: check=service22-check file=/home/store/zz/consul/data/checks/93276716fa2905531cdcd208d592a47a709051ba9eb6deaf170665d79917becd 2023-03-01T09:43:07.099+0800 [DEBUG] agent: restored health check from file: check=check-test-192.168.100.38-8866-JjHzWXRY file=/home/store/zz/consul/data/checks/b39b19109a7f11076abbdba5d2fcb5a78409ec8d3b4070a94a5b68c83cdb3648 2023-03-01T09:43:07.099+0800 [DEBUG] agent.dns: recursor enabled 2023-03-01T09:43:07.099+0800 [DEBUG] agent.dns: recursor enabled 2023-03-01T09:43:07.099+0800 [DEBUG] agent.dns: recursor enabled 2023-03-01T09:43:07.100+0800 [INFO] agent: Started DNS server: address=0.0.0.0:8600 network=udp 2023-03-01T09:43:07.100+0800 [INFO] agent: Started DNS server: address=0.0.0.0:8600 network=tcp 2023-03-01T09:43:07.101+0800 [INFO] agent: Starting server: address=[::]:8500 network=tcp protocol=http 2023-03-01T09:43:07.101+0800 [INFO] agent: Starting server: address=[::]:8443 network=tcp protocol=https 2023-03-01T09:43:07.101+0800 [INFO] agent: Started gRPC listeners: port_name=grpc address=[::]:8502 network=tcp 2023-03-01T09:43:07.101+0800 [INFO] agent: Started gRPC listeners: port_name=grpc_tls address=[::]:1 network=tcp 2023-03-01T09:43:07.101+0800 [INFO] agent: Retry join is supported for the following discovery methods: cluster=LAN discovery_methods="aliyun aws azure digitalocean gce hcp k8s linode mdns os packet scaleway softlayer tencentcloud trito 2023-03-01T09:43:07.101+0800 [INFO] agent: Joining cluster...: cluster=LAN 2023-03-01T09:43:07.101+0800 [INFO] agent: (LAN) joining: lan_addresses=["192.168.100.40", "192.168.100.41"] 2023-03-01T09:43:07.101+0800 [INFO] agent: started state syncer 2023-03-01T09:43:07.101+0800 [INFO] agent: Consul agent running! 2023-03-01T09:43:07.101+0800 [DEBUG] agent.server.memberlist.lan: memberlist: Failed to join 192.168.100.40:8301: dial tcp 192.168.100.40:8301: connect: connection refused 2023-03-01T09:43:07.102+0800 [DEBUG] agent.server.memberlist.lan: memberlist: Initiating push/pull sync with: 192.168.100.41:8301 2023-03-01T09:43:07.103+0800 [INFO] agent: (LAN) joined: number_of_nodes=1 2023-03-01T09:43:07.103+0800 [DEBUG] agent: systemd notify failed: error="No socket" 2023-03-01T09:43:07.103+0800 [INFO] agent: Join cluster completed. Synced with initial agents: cluster=LAN num_agents=1 2023-03-01T09:43:07.316+0800 [DEBUG] agent.server.serf.lan: serf: messageJoinType: R01-P02RD-FGW-008-NM 2023-03-01T09:43:07.516+0800 [DEBUG] agent.server.serf.lan: serf: messageJoinType: R01-P02RD-FGW-008-NM 2023-03-01T09:43:07.716+0800 [DEBUG] agent.server.serf.lan: serf: messageJoinType: R01-P02RD-FGW-008-NM 2023-03-01T09:43:07.916+0800 [DEBUG] agent.server.serf.lan: serf: messageJoinType: R01-P02RD-FGW-008-NM 2023-03-01T09:43:08.098+0800 [DEBUG] agent.server.cert-manager: CA config watch fired - updating auto TLS server name: name=server.dc1.peering.ac1c275b-df91-f852-07ef-9bf788760fcd.consul 2023-03-01T09:43:08.098+0800 [DEBUG] agent.server.cert-manager: server management token watch fired - resetting leaf cert watch 2023-03-01T09:43:12.100+0800 [WARN] agent: Check missed TTL, is now critical: check=check-test-192.168.100.38-8866-JjHzWXRY 2023-03-01T09:43:12.345+0800 [DEBUG] agent.server.raft: accepted connection: local-address=192.168.100.42:8300 remote-address=192.168.100.41:56297 2023-03-01T09:43:12.345+0800 [DEBUG] agent.server.raft: lost leadership because received a requestVote with a newer term 2023-03-01T09:43:12.418+0800 [WARN] agent.server.raft: failed to get previous log: previous-index=33649 last-index=33647 error="log not found" 2023-03-01T09:43:12.418+0800 [DEBUG] agent.hcp_manager: HCP triggering status update 2023-03-01T09:43:12.515+0800 [DEBUG] agent.server.serf.lan: serf: messageUserEventType: consul:new-leader 2023-03-01T09:43:12.515+0800 [INFO] agent.server: New leader elected: payload=R01-P02RD-FGW-007-NM 2023-03-01T09:43:12.547+0800 [DEBUG] agent.server.xds_capacity_controller: updating drain rate limit: rate_limit=1 2023-03-01T09:43:12.559+0800 [DEBUG] agent.server.raft: accepted connection: local-address=192.168.100.42:8300 remote-address=192.168.100.41:16221 2023-03-01T09:43:12.715+0800 [DEBUG] agent.server.serf.lan: serf: messageUserEventType: consul:new-leader 2023-03-01T09:43:12.915+0800 [DEBUG] agent.server.serf.lan: serf: messageUserEventType: consul:new-leader 2023-03-01T09:43:13.115+0800 [DEBUG] agent.server.serf.lan: serf: messageUserEventType: consul:new-leader 2023-03-01T09:43:13.588+0800 [DEBUG] agent.server.cert-manager: server management token watch fired - resetting leaf cert watch 2023-03-01T09:43:14.806+0800 [DEBUG] agent.server.cert-manager: got cache update event: correlationID=leaf error= 2023-03-01T09:43:14.806+0800 [DEBUG] agent.server.cert-manager: leaf certificate watch fired - updating auto TLS certificate: uri=spiffe://ac1c275b-df91-f852-07ef-9bf788760fcd.consul/agent/server/dc/dc1 2023-03-01T09:43:16.723+0800 [DEBUG] agent.server.memberlist.lan: memberlist: Stream connection from=192.168.100.41:37138 2023-03-01T09:43:17.278+0800 [DEBUG] agent.acl: dropping check from result due to ACLs: check=check-test-192.168.100.38-8866-JjHzWXRY 2023-03-01T09:43:17.278+0800 [DEBUG] agent.acl: dropping check from result due to ACLs: check=serfHealth 2023-03-01T09:43:17.278+0800 [DEBUG] agent.acl: dropping check from result due to ACLs: check=service22-check 2023-03-01T09:43:17.279+0800 [WARN] agent: Node info update blocked by ACLs: node=3e21921f-911f-d8f1-f4fc-60f684adf1bd accessorID="anonymous token" 2023-03-01T09:43:17.325+0800 [INFO] agent: Synced service: service=service22-id 2023-03-01T09:43:17.341+0800 [INFO] agent: Synced service: service=test-192.168.100.38-8866-JjHzWXRY 2023-03-01T09:43:17.341+0800 [DEBUG] agent: Check in sync: check=service22-check 2023-03-01T09:43:17.341+0800 [DEBUG] agent: Check in sync: check=check-test-192.168.100.38-8866-JjHzWXRY 2023-03-01T09:43:18.525+0800 [DEBUG] agent.acl: dropping check from result due to ACLs: check=check-test-192.168.100.38-8866-JjHzWXRY 2023-03-01T09:43:18.526+0800 [DEBUG] agent.acl: dropping check from result due to ACLs: check=serfHealth 2023-03-01T09:43:18.526+0800 [DEBUG] agent.acl: dropping check from result due to ACLs: check=service22-check 2023-03-01T09:43:18.526+0800 [WARN] agent: Node info update blocked by ACLs: node=3e21921f-911f-d8f1-f4fc-60f684adf1bd accessorID="anonymous token" 2023-03-01T09:43:18.549+0800 [INFO] agent: Synced service: service=service22-id 2023-03-01T09:43:18.565+0800 [INFO] agent: Synced service: service=test-192.168.100.38-8866-JjHzWXRY 2023-03-01T09:43:18.565+0800 [DEBUG] agent: Check in sync: check=service22-check 2023-03-01T09:43:18.565+0800 [DEBUG] agent: Check in sync: check=check-test-192.168.100.38-8866-JjHzWXRY 2023-03-01T09:43:18.565+0800 [DEBUG] agent: Node info in sync 2023-03-01T09:43:18.565+0800 [DEBUG] agent: Service in sync: service=service22-id 2023-03-01T09:43:18.565+0800 [DEBUG] agent: Service in sync: service=test-192.168.100.38-8866-JjHzWXRY 2023-03-01T09:43:18.565+0800 [DEBUG] agent: Check in sync: check=service22-check 2023-03-01T09:43:18.565+0800 [DEBUG] agent: Check in sync: check=check-test-192.168.100.38-8866-JjHzWXRY panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x1a3279d]

goroutine 166 [running]: github.com/hashicorp/consul/agent/checks.(CheckAlias).checkServiceExistsOnRemoteServer(0xc000c8b040, 0xc000c8b050) github.com/hashicorp/consul/agent/checks/alias.go:161 +0x19d github.com/hashicorp/consul/agent/checks.(CheckAlias).runQuery.func1(0xc00147a140?) github.com/hashicorp/consul/agent/checks/alias.go:234 +0x25 github.com/hashicorp/consul/agent/checks.(CheckAlias).processChecks(0xc000c8b040, {0x0, 0x0, 0x376bce4?}, 0xc000caff50) github.com/hashicorp/consul/agent/checks/alias.go:281 +0x4a2 github.com/hashicorp/consul/agent/checks.(CheckAlias).runQuery(0xc000c8b040, 0xc000926480?) github.com/hashicorp/consul/agent/checks/alias.go:233 +0x2c5 github.com/hashicorp/consul/agent/checks.(CheckAlias).run(0x42feb70?, 0xc00148e000?) github.com/hashicorp/consul/agent/checks/alias.go:86 +0x65 created by github.com/hashicorp/consul/agent/checks.(CheckAlias).Start github.com/hashicorp/consul/agent/checks/alias.go:61 +0x145

ZzIris commented 1 year ago

the consul version is 1.14.4, and I change to the 1.15.0 still happen

ZzIris commented 1 year ago

the only live node output log:

2023-03-01T09:42:55.928+0800 [WARN] agent.cache: handling error in Cache.Notify: cache-type=connect-ca-leaf error="No cluster leader" index=0 2023-03-01T09:42:55.928+0800 [DEBUG] agent.server.cert-manager: got cache update event: correlationID=leaf error="No cluster leader" 2023-03-01T09:42:55.928+0800 [ERROR] agent.server.cert-manager: failed to handle cache update event: error="leaf cert watch returned an error: No cluster leader" 2023-03-01T09:42:56.423+0800 [WARN] agent.server.raft: Election timeout reached, restarting election 2023-03-01T09:42:56.423+0800 [INFO] agent.server.raft: entering candidate state: node="Node at 192.168.100.41:8300 [Candidate]" term=8193 2023-03-01T09:42:56.423+0800 [DEBUG] agent.server.raft: voting for self: term=8193 id=866bbd62-7fe3-94e6-86ea-2e62a6feca60 2023-03-01T09:42:56.424+0800 [DEBUG] agent.server.raft: asking for vote: term=8193 from=3e21921f-911f-d8f1-f4fc-60f684adf1bd address=192.168.100.42:8300 2023-03-01T09:42:56.424+0800 [DEBUG] agent.server.raft: calculated votes needed: needed=2 term=8193 2023-03-01T09:42:56.424+0800 [DEBUG] agent.server.raft: vote granted: from=866bbd62-7fe3-94e6-86ea-2e62a6feca60 term=8193 tally=1 2023-03-01T09:42:56.424+0800 [WARN] agent.server.raft: unable to get address for server, using fallback address: id=3e21921f-911f-d8f1-f4fc-60f684adf1bd fallback=192.168.100.42:8300 error="Could not find address for server id 3e21921f-911f-d8f1-f4fc-60f684adf1bd" 2023-03-01T09:42:56.424+0800 [ERROR] agent.server.raft: failed to make requestVote RPC: target="{Voter 3e21921f-911f-d8f1-f4fc-60f684adf1bd 192.168.100.42:8300}" error="dial tcp 192.168.100.41:0->192.168.100.42:8300: connect: connection refused" term=8193 2023-03-01T09:43:01.278+0800 [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader" 2023-03-01T09:43:01.504+0800 [ERROR] agent: Coordinate update error: error="No cluster leader" 2023-03-01T09:43:05.303+0800 [WARN] agent.cache: handling error in Cache.Notify: cache-type=connect-ca-leaf error="No cluster leader" index=0 2023-03-01T09:43:05.303+0800 [DEBUG] agent.server.cert-manager: got cache update event: correlationID=leaf error="No cluster leader" 2023-03-01T09:43:05.303+0800 [ERROR] agent.server.cert-manager: failed to handle cache update event: error="leaf cert watch returned an error: No cluster leader" 2023-03-01T09:43:05.552+0800 [WARN] agent.server.raft: Election timeout reached, restarting election 2023-03-01T09:43:05.552+0800 [INFO] agent.server.raft: entering candidate state: node="Node at 192.168.100.41:8300 [Candidate]" term=8194 2023-03-01T09:43:05.553+0800 [DEBUG] agent.server.raft: voting for self: term=8194 id=866bbd62-7fe3-94e6-86ea-2e62a6feca60 2023-03-01T09:43:05.554+0800 [DEBUG] agent.server.raft: asking for vote: term=8194 from=3e21921f-911f-d8f1-f4fc-60f684adf1bd address=192.168.100.42:8300 2023-03-01T09:43:05.554+0800 [DEBUG] agent.server.raft: calculated votes needed: needed=2 term=8194 2023-03-01T09:43:05.554+0800 [DEBUG] agent.server.raft: vote granted: from=866bbd62-7fe3-94e6-86ea-2e62a6feca60 term=8194 tally=1 2023-03-01T09:43:05.554+0800 [WARN] agent.server.raft: unable to get address for server, using fallback address: id=3e21921f-911f-d8f1-f4fc-60f684adf1bd fallback=192.168.100.42:8300 error="Could not find address for server id 3e21921f-911f-d8f1-f4fc-60f684adf1bd" 2023-03-01T09:43:05.554+0800 [ERROR] agent.server.raft: failed to make requestVote RPC: target="{Voter 3e21921f-911f-d8f1-f4fc-60f684adf1bd 192.168.100.42:8300}" error="dial tcp 192.168.100.41:0->192.168.100.42:8300: connect: connection refused" term=8194 2023-03-01T09:43:07.108+0800 [DEBUG] agent.server.memberlist.wan: memberlist: Stream connection from=192.168.100.42:18188 2023-03-01T09:43:07.109+0800 [INFO] agent.server.serf.wan: serf: EventMemberJoin: R01-P02RD-FGW-008-NM.dc1 192.168.100.42 2023-03-01T09:43:07.109+0800 [DEBUG] agent.server.memberlist.lan: memberlist: Stream connection from=192.168.100.42:30566 2023-03-01T09:43:07.109+0800 [INFO] agent.server: Handled event for server in area: event=member-join server=R01-P02RD-FGW-008-NM.dc1 area=wan 2023-03-01T09:43:07.110+0800 [INFO] agent.server.serf.lan: serf: EventMemberJoin: R01-P02RD-FGW-008-NM 192.168.100.42 2023-03-01T09:43:07.110+0800 [INFO] agent.server: Adding LAN server: server="R01-P02RD-FGW-008-NM (Addr: tcp/192.168.100.42:8300) (DC: dc1)" 2023-03-01T09:43:07.119+0800 [DEBUG] agent.server.memberlist.lan: memberlist: Stream connection from=192.168.100.42:30570 2023-03-01T09:43:07.309+0800 [DEBUG] agent.server.serf.lan: serf: messageJoinType: R01-P02RD-FGW-008-NM 2023-03-01T09:43:07.509+0800 [DEBUG] agent.server.serf.lan: serf: messageJoinType: R01-P02RD-FGW-008-NM 2023-03-01T09:43:07.710+0800 [DEBUG] agent.server.serf.lan: serf: messageJoinType: R01-P02RD-FGW-008-NM 2023-03-01T09:43:07.910+0800 [DEBUG] agent.server.serf.lan: serf: messageJoinType: R01-P02RD-FGW-008-NM 2023-03-01T09:43:12.358+0800 [WARN] agent.server.raft: Election timeout reached, restarting election 2023-03-01T09:43:12.358+0800 [INFO] agent.server.raft: entering candidate state: node="Node at 192.168.100.41:8300 [Candidate]" term=8195 2023-03-01T09:43:12.359+0800 [DEBUG] agent.server.raft: voting for self: term=8195 id=866bbd62-7fe3-94e6-86ea-2e62a6feca60 2023-03-01T09:43:12.359+0800 [DEBUG] agent.server.raft: asking for vote: term=8195 from=3e21921f-911f-d8f1-f4fc-60f684adf1bd address=192.168.100.42:8300 2023-03-01T09:43:12.359+0800 [DEBUG] agent.server.raft: calculated votes needed: needed=2 term=8195 2023-03-01T09:43:12.359+0800 [DEBUG] agent.server.raft: vote granted: from=866bbd62-7fe3-94e6-86ea-2e62a6feca60 term=8195 tally=1 2023-03-01T09:43:12.434+0800 [DEBUG] agent.server.raft: vote granted: from=3e21921f-911f-d8f1-f4fc-60f684adf1bd term=8195 tally=2 2023-03-01T09:43:12.434+0800 [INFO] agent.server.raft: election won: term=8195 tally=2 2023-03-01T09:43:12.434+0800 [INFO] agent.server.raft: entering leader state: leader="Node at 192.168.100.41:8300 [Leader]" 2023-03-01T09:43:12.434+0800 [INFO] agent.server.raft: added peer, starting replication: peer=3e21921f-911f-d8f1-f4fc-60f684adf1bd 2023-03-01T09:43:12.434+0800 [INFO] agent.server: cluster leadership acquired 2023-03-01T09:43:12.434+0800 [DEBUG] agent.hcp_manager: HCP triggering status update 2023-03-01T09:43:12.435+0800 [INFO] agent.server: New leader elected: payload=R01-P02RD-FGW-007-NM 2023-03-01T09:43:12.436+0800 [WARN] agent.server.raft: appendEntries rejected, sending older logs: peer="{Voter 3e21921f-911f-d8f1-f4fc-60f684adf1bd 192.168.100.42:8300}" next=33648 2023-03-01T09:43:12.450+0800 [INFO] agent.server.raft: pipelining replication: peer="{Voter 3e21921f-911f-d8f1-f4fc-60f684adf1bd 192.168.100.42:8300}" 2023-03-01T09:43:12.485+0800 [DEBUG] agent.server.xds_capacity_controller: updating drain rate limit: rate_limit=1 2023-03-01T09:43:12.550+0800 [INFO] agent.server: initializing acls 2023-03-01T09:43:12.574+0800 [INFO] agent.leader: started routine: routine="acl token reaping" 2023-03-01T09:43:12.574+0800 [INFO] agent.server.autopilot: reconciliation now enabled 2023-03-01T09:43:12.574+0800 [INFO] agent.leader: started routine: routine="federation state anti-entropy" 2023-03-01T09:43:12.574+0800 [INFO] agent.leader: started routine: routine="federation state pruning" 2023-03-01T09:43:12.574+0800 [INFO] agent.leader: started routine: routine="streaming peering resources" 2023-03-01T09:43:12.574+0800 [INFO] agent.leader: started routine: routine="metrics for streaming peering resources" 2023-03-01T09:43:12.574+0800 [INFO] agent.leader: started routine: routine="peering deferred deletion" 2023-03-01T09:43:12.574+0800 [INFO] connect.ca: initialized primary datacenter CA from existing CARoot with provider: provider=consul 2023-03-01T09:43:12.574+0800 [INFO] agent.leader: started routine: routine="intermediate cert renew watch" 2023-03-01T09:43:12.574+0800 [INFO] agent.leader: started routine: routine="CA root pruning" 2023-03-01T09:43:12.574+0800 [INFO] agent.leader: started routine: routine="CA root expiration metric" 2023-03-01T09:43:12.574+0800 [INFO] agent.leader: started routine: routine="CA signing expiration metric" 2023-03-01T09:43:12.574+0800 [INFO] agent.leader: started routine: routine="virtual IP version check" 2023-03-01T09:43:12.574+0800 [INFO] agent.leader: started routine: routine="config entry controllers" 2023-03-01T09:43:12.574+0800 [DEBUG] agent.server: successfully established leadership: duration=24.521758ms 2023-03-01T09:43:12.574+0800 [INFO] agent.leader: stopping routine: routine="virtual IP version check" 2023-03-01T09:43:12.574+0800 [INFO] agent.leader: stopped routine: routine="virtual IP version check" 2023-03-01T09:43:12.575+0800 [INFO] agent.server: member joined, marking health alive: member=R01-P02RD-FGW-007-NM partition=default 2023-03-01T09:43:12.710+0800 [DEBUG] agent.server.serf.lan: serf: messageUserEventType: consul:new-leader 2023-03-01T09:43:12.910+0800 [DEBUG] agent.server.serf.lan: serf: messageUserEventType: consul:new-leader 2023-03-01T09:43:12.936+0800 [ERROR] agent.server.autopilot: Failed to reconcile current state with the desired state 2023-03-01T09:43:13.110+0800 [DEBUG] agent.server.serf.lan: serf: messageUserEventType: consul:new-leader 2023-03-01T09:43:13.310+0800 [DEBUG] agent.server.serf.lan: serf: messageUserEventType: consul:new-leader 2023-03-01T09:43:13.523+0800 [DEBUG] agent.server.cert-manager: server management token watch fired - resetting leaf cert watch 2023-03-01T09:43:14.578+0800 [DEBUG] agent.acl: dropping check from result due to ACLs: check=check-test-192.168.100.38-8866-JjHzWXRY 2023-03-01T09:43:14.578+0800 [DEBUG] agent.acl: dropping check from result due to ACLs: check=serfHealth 2023-03-01T09:43:14.579+0800 [WARN] agent: Node info update blocked by ACLs: node=866bbd62-7fe3-94e6-86ea-2e62a6feca60 accessorID="anonymous token" 2023-03-01T09:43:14.599+0800 [INFO] agent: Synced service: service=test-192.168.100.38-8866-JjHzWXRY 2023-03-01T09:43:14.599+0800 [DEBUG] agent: Check in sync: check=check-test-192.168.100.38-8866-JjHzWXRY 2023-03-01T09:43:14.599+0800 [DEBUG] agent: Node info in sync 2023-03-01T09:43:14.599+0800 [DEBUG] agent: Service in sync: service=test-192.168.100.38-8866-JjHzWXRY 2023-03-01T09:43:14.599+0800 [DEBUG] agent: Check in sync: check=check-test-192.168.100.38-8866-JjHzWXRY 2023-03-01T09:43:16.741+0800 [DEBUG] agent.server.memberlist.lan: memberlist: Initiating push/pull sync with: R01-P02RD-FGW-008-NM 192.168.100.42:8301 2023-03-01T09:43:18.403+0800 [DEBUG] agent.server.cert-manager: got cache update event: correlationID=leaf error= 2023-03-01T09:43:18.403+0800 [DEBUG] agent.server.cert-manager: leaf certificate watch fired - updating auto TLS certificate: uri=spiffe://ac1c275b-df91-f852-07ef-9bf788760fcd.consul/agent/server/dc/dc1 2023-03-01T09:43:18.806+0800 [INFO] agent.server.raft: aborting pipeline replication: peer="{Voter 3e21921f-911f-d8f1-f4fc-60f684adf1bd 192.168.100.42:8300}" 2023-03-01T09:43:18.899+0800 [ERROR] agent.server.raft: failed to appendEntries to: peer="{Voter 3e21921f-911f-d8f1-f4fc-60f684adf1bd 192.168.100.42:8300}" error=EOF 2023-03-01T09:43:18.937+0800 [WARN] agent: error getting server health from server: server=R01-P02RD-FGW-008-NM error="rpc error making call: stream closed" 2023-03-01T09:43:18.962+0800 [ERROR] agent.server.raft: failed to appendEntries to: peer="{Voter 3e21921f-911f-d8f1-f4fc-60f684adf1bd 192.168.100.42:8300}" error="dial tcp 192.168.100.41:0->192.168.100.42:8300: connect: connection refused" 2023-03-01T09:43:18.985+0800 [ERROR] agent.server.raft: failed to heartbeat to: peer=192.168.100.42:8300 backoff time=10ms error="dial tcp 192.168.100.41:0->192.168.100.42:8300: connect: connection refused" 2023-03-01T09:43:19.030+0800 [ERROR] agent.server.raft: failed to appendEntries to: peer="{Voter 3e21921f-911f-d8f1-f4fc-60f684adf1bd 192.168.100.42:8300}" error="dial tcp 192.168.100.41:0->192.168.100.42:8300: connect: connection refused" 2023-03-01T09:43:19.142+0800 [ERROR] agent.server.raft: failed to appendEntries to: peer="{Voter 3e21921f-911f-d8f1-f4fc-60f684adf1bd 192.168.100.42:8300}" error="dial tcp 192.168.100.41:0->192.168.100.42:8300: connect: connection refused" 2023-03-01T09:43:19.263+0800 [ERROR] agent.server.raft: failed to appendEntries to: peer="{Voter 3e21921f-911f-d8f1-f4fc-60f684adf1bd 192.168.100.42:8300}" error="dial tcp 192.168.100.41:0->192.168.100.42:8300: connect: connection refused" 2023-03-01T09:43:19.402+0800 [ERROR] agent.server.raft: failed to appendEntries to: peer="{Voter 3e21921f-911f-d8f1-f4fc-60f684adf1bd 192.168.100.42:8300}" error="dial tcp 192.168.100.41:0->192.168.100.42:8300: connect: connection refused" 2023-03-01T09:43:19.630+0800 [ERROR] agent.server.raft: failed to appendEntries to: peer="{Voter 3e21921f-911f-d8f1-f4fc-60f684adf1bd 192.168.100.42:8300}" error="dial tcp 192.168.100.41:0->192.168.100.42:8300: connect: connection refused" 2023-03-01T09:43:19.739+0800 [ERROR] agent.server.raft: failed to heartbeat to: peer=192.168.100.42:8300 backoff time=10ms error="dial tcp 192.168.100.41:0->192.168.100.42:8300: connect: connection refused" 2023-03-01T09:43:19.937+0800 [WARN] agent: error getting server health from server: server=R01-P02RD-FGW-008-NM error="context deadline exceeded" 2023-03-01T09:43:19.943+0800 [WARN] agent: Coordinate update blocked by ACLs: accessorID="anonymous token" 2023-03-01T09:43:20.016+0800 [ERROR] agent.server.raft: failed to appendEntries to: peer="{Voter 3e21921f-911f-d8f1-f4fc-60f684adf1bd 192.168.100.42:8300}" error="dial tcp 192.168.100.41:0->192.168.100.42:8300: connect: connection refused" 2023-03-01T09:43:20.433+0800 [DEBUG] agent.server.memberlist.lan: memberlist: Failed UDP ping: R01-P02RD-FGW-008-NM (timeout reached) 2023-03-01T09:43:20.678+0800 [ERROR] agent.server.raft: failed to heartbeat to: peer=192.168.100.42:8300 backoff time=10ms error="dial tcp 192.168.100.41:0->192.168.100.42:8300: connect: connection refused" 2023-03-01T09:43:20.709+0800 [ERROR] agent.server.raft: failed to appendEntries to: peer="{Voter 3e21921f-911f-d8f1-f4fc-60f684adf1bd 192.168.100.42:8300}" error="dial tcp 192.168.100.41:0->192.168.100.42:8300: connect: connection refused" 2023-03-01T09:43:20.934+0800 [INFO] agent.server.memberlist.lan: memberlist: Suspect R01-P02RD-FGW-008-NM has failed, no acks received 2023-03-01T09:43:20.937+0800 [WARN] agent: error getting server health from server: server=R01-P02RD-FGW-008-NM error="rpc error getting client: failed to get conn: dial tcp 192.168.100.41:0->192.168.100.42:8300: connect: connection refused" 2023-03-01T09:43:21.306+0800 [WARN] agent.server.raft: failed to contact: server-id=3e21921f-911f-d8f1-f4fc-60f684adf1bd time=2.500065422s 2023-03-01T09:43:21.306+0800 [WARN] agent.server.raft: failed to contact quorum of nodes, stepping down 2023-03-01T09:43:21.306+0800 [INFO] agent.server.raft: entering follower state: follower="Node at 192.168.100.41:8300 [Follower]" leader-address= leader-id= 2023-03-01T09:43:21.307+0800 [DEBUG] agent.hcp_manager: HCP triggering status update 2023-03-01T09:43:21.307+0800 [DEBUG] agent.server: shutting down leader loop 2023-03-01T09:43:21.307+0800 [INFO] agent.leader: stopped routine: routine="acl token reaping" 2023-03-01T09:43:21.307+0800 [INFO] agent.leader: stopped routine: routine="intermediate cert renew watch" 2023-03-01T09:43:21.307+0800 [INFO] agent.leader: stopped routine: routine="CA root pruning" 2023-03-01T09:43:21.307+0800 [INFO] agent.leader: stopped routine: routine="streaming peering resources" 2023-03-01T09:43:21.307+0800 [INFO] agent.leader: stopped routine: routine="peering deferred deletion" 2023-03-01T09:43:21.307+0800 [INFO] agent.server.peering_metrics: stopping peering metrics 2023-03-01T09:43:21.307+0800 [INFO] agent.leader: stopped routine: routine="metrics for streaming peering resources" 2023-03-01T09:43:21.307+0800 [INFO] agent.leader: stopped routine: routine="config entry controllers" 2023-03-01T09:43:21.307+0800 [INFO] agent.server.autopilot: reconciliation now disabled 2023-03-01T09:43:21.307+0800 [INFO] agent.server: cluster leadership lost 2023-03-01T09:43:21.307+0800 [INFO] agent.leader: stopped routine: routine="CA signing expiration metric" 2023-03-01T09:43:21.307+0800 [INFO] agent.leader: stopped routine: routine="federation state pruning" 2023-03-01T09:43:21.307+0800 [INFO] agent.leader: stopped routine: routine="CA root expiration metric" 2023-03-01T09:43:21.434+0800 [DEBUG] agent.server.memberlist.lan: memberlist: Failed UDP ping: R01-P02RD-FGW-008-NM (timeout reached) 2023-03-01T09:43:21.666+0800 [ERROR] agent.server.raft: failed to heartbeat to: peer=192.168.100.42:8300 backoff time=20ms error="dial tcp 192.168.100.41:0->192.168.100.42:8300: connect: connection refused" 2023-03-01T09:43:21.936+0800 [WARN] agent: error getting server health from server: server=R01-P02RD-FGW-008-NM error="context deadline exceeded" 2023-03-01T09:43:22.053+0800 [ERROR] agent.server.raft: failed to appendEntries to: peer="{Voter 3e21921f-911f-d8f1-f4fc-60f684adf1bd 192.168.100.42:8300}" error="dial tcp 192.168.100.41:0->192.168.100.42:8300: connect: connection refused" 2023-03-01T09:43:22.935+0800 [INFO] agent.server.memberlist.lan: memberlist: Suspect R01-P02RD-FGW-008-NM has failed, no acks received 2023-03-01T09:43:22.937+0800 [WARN] agent: error getting server health from server: server=R01-P02RD-FGW-008-NM error="rpc error getting client: failed to get conn: dial tcp 192.168.100.41:0->192.168.100.42:8300: connect: connection refused" 2023-03-01T09:43:22.941+0800 [INFO] agent: (LAN) joining: lan_addresses=["192.168.100.40", "192.168.100.42"] 2023-03-01T09:43:22.942+0800 [DEBUG] agent.server.memberlist.lan: memberlist: Failed to join 192.168.100.40:8301: dial tcp 192.168.100.40:8301: connect: connection refused 2023-03-01T09:43:22.942+0800 [DEBUG] agent.server.memberlist.lan: memberlist: Failed to join 192.168.100.42:8301: dial tcp 192.168.100.42:8301: connect: connection refused 2023-03-01T09:43:22.942+0800 [WARN] agent: (LAN) couldn't join: number_of_nodes=0 error= 2 errors occurred: * Failed to join 192.168.100.40:8301: dial tcp 192.168.100.40:8301: connect: connection refused * Failed to join 192.168.100.42:8301: dial tcp 192.168.100.42:8301: connect: connection refused
2023-03-01T09:43:22.942+0800 [WARN] agent: Join cluster failed, will retry: cluster=LAN retry_interval=30s error= 2 errors occurred: * Failed to join 192.168.100.40:8301: dial tcp 192.168.100.40:8301: connect: connection refused * Failed to join 192.168.100.42:8301: dial tcp 192.168.100.42:8301: connect: connection refused

2023-03-01T09:43:23.938+0800 [WARN] agent: error getting server health from server: server=R01-P02RD-FGW-008-NM error="context deadline exceeded" 2023-03-01T09:43:24.433+0800 [DEBUG] agent.server.memberlist.lan: memberlist: Failed UDP ping: R01-P02RD-FGW-008-NM (timeout reached) 2023-03-01T09:43:24.934+0800 [INFO] agent.server.memberlist.lan: memberlist: Marking R01-P02RD-FGW-008-NM as failed, suspect timeout reached (0 peer confirmations) 2023-03-01T09:43:24.934+0800 [INFO] agent.server.serf.lan: serf: EventMemberFailed: R01-P02RD-FGW-008-NM 192.168.100.42 2023-03-01T09:43:24.934+0800 [INFO] agent.server: Removing LAN server: server="R01-P02RD-FGW-008-NM (Addr: tcp/192.168.100.42:8300) (DC: dc1)" 2023-03-01T09:43:24.937+0800 [WARN] agent: error getting server health from server: server=R01-P02RD-FGW-008-NM error="rpc error getting client: failed to get conn: dial tcp 192.168.100.41:0->192.168.100.42:8300: connect: connection refused" 2023-03-01T09:43:25.937+0800 [WARN] agent: error getting server health from server: server=R01-P02RD-FGW-008-NM error="context deadline exceeded" 2023-03-01T09:43:26.933+0800 [INFO] agent.server.memberlist.lan: memberlist: Suspect R01-P02RD-FGW-008-NM has failed, no acks received 2023-03-01T09:43:26.937+0800 [WARN] agent: error getting server health from server: server=R01-P02RD-FGW-008-NM error="rpc error getting client: failed to get conn: dial tcp 192.168.100.41:0->192.168.100.42:8300: connect: connection refused" 2023-03-01T09:43:27.937+0800 [WARN] agent: error getting server health from server: server=R01-P02RD-FGW-008-NM error="context deadline exceeded" 2023-03-01T09:43:28.937+0800 [WARN] agent: error getting server health from server: server=R01-P02RD-FGW-008-NM error="rpc error getting client: failed to get conn: dial tcp 192.168.100.41:0->192.168.100.42:8300: connect: connection refused" 2023-03-01T09:43:29.716+0800 [WARN] agent.server.raft: heartbeat timeout reached, starting election: last-leader-addr= last-leader-id= 2023-03-01T09:43:29.716+0800 [INFO] agent.server.raft: entering candidate state: node="Node at 192.168.100.41:8300 [Candidate]" term=8196 2023-03-01T09:43:29.717+0800 [DEBUG] agent.server.raft: voting for self: term=8196 id=866bbd62-7fe3-94e6-86ea-2e62a6feca60 2023-03-01T09:43:29.717+0800 [DEBUG] agent.server.raft: asking for vote: term=8196 from=3e21921f-911f-d8f1-f4fc-60f684adf1bd address=192.168.100.42:8300 2023-03-01T09:43:29.717+0800 [DEBUG] agent.server.raft: calculated votes needed: needed=2 term=8196 2023-03-01T09:43:29.718+0800 [DEBUG] agent.server.raft: vote granted: from=866bbd62-7fe3-94e6-86ea-2e62a6feca60 term=8196 tally=1 2023-03-01T09:43:29.718+0800 [WARN] agent.server.raft: unable to get address for server, using fallback address: id=3e21921f-911f-d8f1-f4fc-60f684adf1bd fallback=192.168.100.42:8300 error="Could not find address for server id 3e21921f-911f-d8f1-f4fc-60f684adf1bd" 2023-03-01T09:43:29.718+0800 [ERROR] agent.server.raft: failed to make requestVote RPC: target="{Voter 3e21921f-911f-d8f1-f4fc-60f684adf1bd 192.168.100.42:8300}" error="dial tcp 192.168.100.41:0->192.168.100.42:8300: connect: connection refused" term=8196 2023-03-01T09:43:29.936+0800 [WARN] agent: error getting server health from server: server=R01-P02RD-FGW-008-NM error="context deadline exceeded" 2023-03-01T09:43:30.932+0800 [DEBUG] agent.server.memberlist.wan: memberlist: Failed UDP ping: R01-P02RD-FGW-008-NM.dc1 (timeout reached) 2023-03-01T09:43:30.937+0800 [WARN] agent: error getting server health from server: server=R01-P02RD-FGW-008-NM error="rpc error getting client: failed to get conn: dial tcp 192.168.100.41:0->192.168.100.42:8300: connect: connection refused" 2023-03-01T09:43:31.938+0800 [WARN] agent: error getting server health from server: server=R01-P02RD-FGW-008-NM error="context deadline exceeded" 2023-03-01T09:43:32.932+0800 [INFO] agent.server.memberlist.wan: memberlist: Suspect R01-P02RD-FGW-008-NM.dc1 has failed, no acks received 2023-03-01T09:43:32.937+0800 [WARN] agent: error getting server health from server: server=R01-P02RD-FGW-008-NM error="rpc error getting client: failed to get conn: dial tcp 192.168.100.41:0->192.168.100.42:8300: connect: connection refused" 2023-03-01T09:43:33.937+0800 [WARN] agent: error getting server health from server: server=R01-P02RD-FGW-008-NM error="context deadline exceeded" 2023-03-01T09:43:34.936+0800 [WARN] agent: error getting server health from server: server=R01-P02RD-FGW-008-NM error="rpc error getting client: failed to get conn: dial tcp 192.168.100.41:0->192.168.100.42:8300: connect: connection refused" 2023-03-01T09:43:35.936+0800 [WARN] agent: error getting server health from server: server=R01-P02RD-FGW-008-NM error="context deadline exceeded" 2023-03-01T09:43:36.802+0800 [WARN] agent.server.raft: Election timeout reached, restarting election 2023-03-01T09:43:36.802+0800 [INFO] agent.server.raft: entering candidate state: node="Node at 192.168.100.41:8300 [Candidate]" term=8197 2023-03-01T09:43:36.803+0800 [DEBUG] agent.server.raft: voting for self: term=8197 id=866bbd62-7fe3-94e6-86ea-2e62a6feca60 2023-03-01T09:43:36.803+0800 [DEBUG] agent.server.raft: asking for vote: term=8197 from=3e21921f-911f-d8f1-f4fc-60f684adf1bd address=192.168.100.42:8300 2023-03-01T09:43:36.803+0800 [DEBUG] agent.server.raft: calculated votes needed: needed=2 term=8197 2023-03-01T09:43:36.803+0800 [DEBUG] agent.server.raft: vote granted: from=866bbd62-7fe3-94e6-86ea-2e62a6feca60 term=8197 tally=1 2023-03-01T09:43:36.803+0800 [WARN] agent.server.raft: unable to get address for server, using fallback address: id=3e21921f-911f-d8f1-f4fc-60f684adf1bd fallback=192.168.100.42:8300 error="Could not find address for server id 3e21921f-911f-d8f1-f4fc-60f684adf1bd" 2023-03-01T09:43:36.804+0800 [ERROR] agent.server.raft: failed to make requestVote RPC: target="{Voter 3e21921f-911f-d8f1-f4fc-60f684adf1bd 192.168.100.42:8300}" error="dial tcp 192.168.100.41:0->192.168.100.42:8300: connect: connection refused" term=8197 2023-03-01T09:43:36.937+0800 [WARN] agent: error getting server health from server: server=R01-P02RD-FGW-008-NM error="rpc error getting client: failed to get conn: dial tcp 192.168.100.41:0->192.168.100.42:8300: connect: connection refused" 2023-03-01T09:43:37.937+0800 [WARN] agent: error getting server health from server: server=R01-P02RD-FGW-008-NM error="context deadline exceeded" 2023-03-01T09:43:38.937+0800 [WARN] agent: error getting server health from server: server=R01-P02RD-FGW-008-NM error="rpc error getting client: failed to get conn: dial tcp 192.168.100.41:0->192.168.100.42:8300: connect: connection refused" 2023-03-01T09:43:39.936+0800 [WARN] agent: error getting server health from server: server=R01-P02RD-FGW-008-NM error="context deadline exceeded" 2023-03-01T09:43:40.932+0800 [DEBUG] agent.server.memberlist.wan: memberlist: Failed UDP ping: R01-P02RD-FGW-008-NM.dc1 (timeout reached)

ZzIris commented 1 year ago

but if I restart the original leader node fist and restart the follower secondly, the cluster will be health, no panic happen

huikang commented 1 year ago

then I close the leader node and 1 follower, and I just restart the follower , I find it panic,

It seems the out.NodeServices or srv is nil pointer

https://github.com/hashicorp/consul/blob/4f2d9a91e515b2a54063d398054ca60a04d968e6/agent/checks/alias.go#L161-L165

Could you share you service registration file with the checks? We will try to reproduce the panic. Thanks.

ZzIris commented 1 year ago

then I close the leader node and 1 follower, and I just restart the follower , I find it panic,

It seems the out.NodeServices or srv is nil pointer

https://github.com/hashicorp/consul/blob/4f2d9a91e515b2a54063d398054ca60a04d968e6/agent/checks/alias.go#L161-L165

Could you share you service registration file with the checks? We will try to reproduce the panic. Thanks.

I use the consul api to register, like that

reg := &consulapi.AgentServiceRegistration {
        ID: registration.ID,
        Name: registration.Name,
        Address: registration.Address,
        Port: registration.Port,
        Tags: registration.Tags,
        Meta: m,
        Check: &consulapi.AgentServiceCheck {
            CheckID: fmt.Sprintf("%s%s", checkIdPrefix, registration.ID),
            Name: fmt.Sprintf("%s%s", checkIdPrefix, registration.Name),
            TTL: defaultCheckInterval.String(),
            Timeout: defaultCheckTimeout.String(),
            //DeregisterCriticalServiceAfter: defaultDeregisterInterval.String(),
        },
    }

    tmp := -1
    sel, cli := c.random()
    if cli == nil {
        return nil, xerror.ErrInvalidConsulClient
    }

    for {
        if tmp == -1 {
            tmp = sel
        }

        xLog.Info(" %s, %s register on node %s", requestId, registration, c.names[sel])
        if err = cli.Agent().ServiceRegister(reg); err != nil {
            xLog.Error("%s, %s register on node %s error: %s", requestId, registration, c.names[sel], err)
            if tmp != sel {
                _ = c.DeregisterService(ctx, &entity.Deregistration{Name: registration.Name, ID: registration.ID})
            }

            break
        }

        if sel, cli = c.next(sel); cli == nil || tmp == sel {
            break
        }
    }