hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.4k stars 4.43k forks source link

Unable to elect cluster leader when upgrading 1.7.2 to 1.8.4 #8827

Open ebocchi opened 4 years ago

ebocchi commented 4 years ago

Overview of the Issue

Consul was unable to elect the cluster leader when upgrading from 1.7.2 to 1.8.4 in a cluster made of 3 hosts. The upgrade was performed by installing the newer version of the consul binary and restarting the service, one host at a time.

The inability to elect the leader appeared after the first upgrade and restart. This caused the consul KV store to be unavailable together with many administrative commands (e.g., consul operator raft list-peers mentioned in the outage recovery guide). The consul member command was alternatively returning error or the list of peer servers and clients.

Rolling back the upgraded node to 1.7.2 did not fix the problem and caused the process to panic. The issue was fixed by upgrading all the server nodes to 1.8.4. At this stage, clients were still running 1.7.2 and were working fine. They have been progressively upgraded to 1.8.4 as well.

Reproduction Steps

  1. Cluster of 3 consul servers (bootstrap-expect = 3) running 1.7.2.
  2. Stop one of the 3 nodes and restart with 1.8.4.
  3. Leader election failing

This problem was observed on one active cluster. Attempts to reproduce it on a second testing cluster were unsuccessful. The two clusters share the same configuration and software versions. The infrastructure underneath is decoupled.

Operating system and Environment details

CentOS 7.8.2003 on VM. Consul 1.7.2 upgraded to 1.8.4

Log Fragments

Starting new version on one of the three servers (here 'server1'):

Sep 30 15:55:17 __server1-hostname__ consul: ==> Starting Consul agent...
Sep 30 15:55:17 __server1-hostname__ consul: Version: '1.8.4'
Sep 30 15:55:17 __server1-hostname__ consul: Node ID: '63457415-63b5-c0b4-f5d3-3cc77517276b'
Sep 30 15:55:17 __server1-hostname__ consul: Node name: '__server1-hostname__'
Sep 30 15:55:17 __server1-hostname__ consul: Datacenter: 'DCmain' (Segment: '<all>')
Sep 30 15:55:17 __server1-hostname__ consul: Server: true (Bootstrap: false)
Sep 30 15:55:17 __server1-hostname__ consul: Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)
Sep 30 15:55:17 __server1-hostname__ consul: Cluster Addr: __server1-IPv6__ (LAN: 8301, WAN: 8302)
Sep 30 15:55:17 __server1-hostname__ consul: Encrypt: Gossip: true, TLS-Outgoing: true, TLS-Incoming: true, Auto-Encrypt-TLS: false
Sep 30 15:55:17 __server1-hostname__ consul: ==> Log data will now stream in as it occurs:
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.023+0200 [WARN]  agent: Node name "__server1-hostname__" will not be discoverable via DNS due to invalid characters. Valid characters include all alpha-numerics and dashes.
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.023+0200 [WARN]  agent: bootstrap_expect > 0: expecting 3 servers
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.036+0200 [WARN]  agent.auto_config: Node name "__server1-hostname__" will not be discoverable via DNS due to invalid characters. Valid characters include all alpha-numerics and dashes.
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.036+0200 [WARN]  agent.auto_config: bootstrap_expect > 0: expecting 3 servers
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.070+0200 [INFO]  agent.server.raft: restored from snapshot: id=3194-30500715-1601416277080
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.221+0200 [INFO]  agent.server.raft: initial configuration: index=30146542 servers="[{Suffrage:Voter ID:7d5462ce-8b23-8b0f-a537-5c26da1818f9 Address:[__server3-IPv6__]:8300} {Suffrage:Voter ID:63457415-63b5-c0b4-f5d3-3cc77517276b Address:[__server1-IPv6__]:8300} {Suffrage:Voter ID:91c3145d-1879-9332-0e5c-c90736797ef3 Address:[__server2-IPv6__]:8300}]"
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.221+0200 [INFO]  agent.server.raft: entering follower state: follower="Node at [__server1-IPv6__]:8300 [Follower]" leader=
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.223+0200 [INFO]  agent.server.serf.wan: serf: EventMemberJoin: __server1-hostname__.DCmain __server1-IPv6__
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.224+0200 [INFO]  agent.server.serf.wan: serf: Attempting re-join to previously known node: __server1-hostname__.dc1: [__server1-IPv6__]:8302
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.225+0200 [INFO]  agent.server.serf.lan: serf: EventMemberJoin: __server1-hostname__ __server1-IPv6__
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.225+0200 [INFO]  agent.router: Initializing LAN area manager
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.225+0200 [INFO]  agent.server.serf.wan: serf: Re-joined to previously known node: __server1-hostname__.dc1: [__server1-IPv6__]:8302
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.225+0200 [INFO]  agent.server: Adding LAN server: server="__server1-hostname__ (Addr: tcp/[__server1-IPv6__]:8300) (DC: DCmain)"
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.225+0200 [INFO]  agent.server: Raft data found, disabling bootstrap mode
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.225+0200 [INFO]  agent.server.serf.lan: serf: Attempting re-join to previously known node: __workernode-044ed0bdd2__: [__IP-044ed0bdd2__]:8301
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.226+0200 [INFO]  agent.server: Handled event for server in area: event=member-join server=__server1-hostname__.DCmain area=wan
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.227+0200 [INFO]  agent: Started DNS server: address=127.0.0.1:8600 network=udp
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.227+0200 [INFO]  agent: Started DNS server: address=127.0.0.1:8600 network=tcp
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.228+0200 [INFO]  agent: Started HTTP server: address=127.0.0.1:8500 network=tcp
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.228+0200 [INFO]  agent: started state syncer
Sep 30 15:55:17 __server1-hostname__ consul: ==> Consul agent running!

Refuting alive messages and detecting no cluster leader:

Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.228+0200 [INFO]  agent: Retry join is supported for the following discovery methods: cluster=LAN discovery_methods="aliyun aws azure digitalocean gce k8s linode mdns os packet scaleway softlayer tencentcloud triton vsphere"
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.228+0200 [INFO]  agent: Joining cluster...: cluster=LAN
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.228+0200 [INFO]  agent: (LAN) joining: lan_addresses=[consul-dcmain]
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.256+0200 [INFO]  agent.server.serf.lan: serf: EventMemberJoin: __workernode-ac460f6b70__ __IP-ac460f6b70__
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.256+0200 [INFO]  agent.server.serf.lan: serf: EventMemberJoin: __workernode-fabc864860__ __IP-fabc864860__
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.256+0200 [INFO]  agent.server.serf.lan: serf: EventMemberJoin: __workernode-044ed0bdd2__ __IP-044ed0bdd2__
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.256+0200 [INFO]  agent.server.serf.lan: serf: EventMemberJoin: __workernode-1d701ccbee__ __IP-1d701ccbee__
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.256+0200 [INFO]  agent.server.serf.lan: serf: EventMemberJoin: __workernode-3c81b5e0c3__ __IP-3c81b5e0c3__
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.256+0200 [WARN]  agent.server.memberlist.lan: memberlist: Refuting an alive message for '__server1-hostname__' (__server1-IPv6__:8301) meta:([255 142 167 118 115 110 95 109 105 110 161 50 166 101 120 112 101 99 116 161 51 167 118 115 110 95 109 97 120 161 51 164 112 111 114 116 164 56 51 48 48 167 117 115 101 95 116 108 115 161 49 167 115 101 103 109 101 110 116 160 165 98 117 105 108 100 174 49 46 55 46 50 58 57 101 97 49 97 50 48 52 162 100 99 166 109 101 121 114 105 110 162 105 100 218 0 36 54 51 52 53 55 52 49 53 45 54 51 98 53 45 99 48 98 52 45 102 53 100 51 45 51 99 99 55 55 53 49 55 50 55 54 98 163 118 115 110 161 50 168 114 97 102 116 95 118 115 110 161 51 164 97 99 108 115 161 48 173 119 97 110 95 106 111 105 110 95 112 111 114 116 164 56 51 48 50 164 114 111 108 101 166 99 111 110 115 117 108] VS [255 143 167 115 101 103 109 101 110 116 160 167 118 115 110 95 109 105 110 161 50 162 100 99 166 109 101 121 114 105 110 162 105 100 218 0 36 54 51 52 53 55 52 49 53 45 54 51 98 53 45 99 48 98 52 45 102 53 100 51 45 51 99 99 55 55 53 49 55 50 55 54 98 165 98 117 105 108 100 174 49 46 56 46 52 58 49 50 98 49 54 100 102 51 166 101 120 112 101 99 116 161 51 164 97 99 108 115 161 48 165 102 116 95 102 115 161 49 173 119 97 110 95 106 111 105 110 95 112 111 114 116 164 56 51 48 50 164 114 111 108 101 166 99 111 110 115 117 108 164 112 111 114 116 164 56 51 48 48 167 117 115 101 95 116 108 115 161 49 163 118 115 110 161 50 167 118 115 110 95 109 97 120 161 51 168 114 97 102 116 95 118 115 110 161 51]), vsn:([1 5 2 2 5 4] VS [1 5 2 2 5 4])
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.256+0200 [INFO]  agent.server.serf.lan: serf: EventMemberJoin: __workernode-5a4b412a86__ __IP-5a4b412a86__
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.257+0200 [INFO]  agent.server.serf.lan: serf: EventMemberJoin: __workernode-315091d343__ __IP-315091d343__
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.257+0200 [INFO]  agent.server.serf.lan: serf: EventMemberJoin: __workernode-f250ed5924__ __IP-f250ed5924__
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.257+0200 [INFO]  agent.server.serf.lan: serf: EventMemberJoin: __workernode-a926962c19__ __IP-a926962c19__
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.257+0200 [INFO]  agent.server.serf.lan: serf: EventMemberJoin: __server2-hostname__ __server2-IPv6__
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.257+0200 [INFO]  agent.server.serf.lan: serf: EventMemberJoin: __workernode-bf50c609d1__ __IP-bf50c609d1__
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.257+0200 [INFO]  agent.server.serf.lan: serf: EventMemberJoin: __workernode-58511415b3__ __IP-58511415b3__
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.257+0200 [INFO]  agent.server.serf.lan: serf: EventMemberJoin: __workernode-3e1a55e60a__ __IP-3e1a55e60a__
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.257+0200 [INFO]  agent.server.serf.lan: serf: EventMemberJoin: __workernode-86c0367dce__ __IP-86c0367dce__
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.257+0200 [INFO]  agent.server.serf.lan: serf: EventMemberJoin: __workernode-d8a2425079__ __IP-d8a2425079__
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.257+0200 [INFO]  agent.server.serf.lan: serf: EventMemberJoin: __workernode-56f5a42d0c__ __IP-56f5a42d0c__
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.257+0200 [INFO]  agent.server.serf.lan: serf: EventMemberJoin: __server3-hostname__ __server3-IPv6__
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.257+0200 [INFO]  agent.server.serf.lan: serf: EventMemberJoin: __workernode-1800367aec__ __IP-1800367aec__
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.258+0200 [INFO]  agent.server.serf.lan: serf: Re-joined to previously known node: __workernode-044ed0bdd2__: [__IP-044ed0bdd2__]:8301
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.258+0200 [INFO]  agent.server: Adding LAN server: server="__server2-hostname__ (Addr: tcp/[__server2-IPv6__]:8300) (DC: DCmain)"
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.258+0200 [INFO]  agent.server: Adding LAN server: server="__server3-hostname__ (Addr: tcp/[__server3-IPv6__]:8300) (DC: DCmain)"
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.307+0200 [INFO]  agent: (LAN) joined: number_of_nodes=6
Sep 30 15:55:17 __server1-hostname__ consul: 2020-09-30T15:55:17.308+0200 [INFO]  agent: Join cluster completed. Synced with initial agents: cluster=LAN num_agents=6
Sep 30 15:55:24 __server1-hostname__ consul: 2020-09-30T15:55:24.256+0200 [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"

Continues in a loop to self-elect as leader but does not sync with the other servers in the cluster:

Sep 30 15:55:25 __server1-hostname__ consul: 2020-09-30T15:55:25.275+0200 [WARN]  agent.server.raft: heartbeat timeout reached, starting election: last-leader=
Sep 30 15:55:25 __server1-hostname__ consul: 2020-09-30T15:55:25.276+0200 [INFO]  agent.server.raft: entering candidate state: node="Node at [__server1-IPv6__]:8300 [Candidate]" term=3195
Sep 30 15:55:25 __server1-hostname__ consul: 2020-09-30T15:55:25.327+0200 [INFO]  agent.server.raft: election won: tally=2
Sep 30 15:55:25 __server1-hostname__ consul: 2020-09-30T15:55:25.327+0200 [INFO]  agent.server.raft: entering leader state: leader="Node at [__server1-IPv6__]:8300 [Leader]"
Sep 30 15:55:25 __server1-hostname__ consul: 2020-09-30T15:55:25.328+0200 [INFO]  agent.server: cluster leadership acquired
Sep 30 15:55:25 __server1-hostname__ consul: 2020-09-30T15:55:25.328+0200 [INFO]  agent.server: New leader elected: payload=__server1-hostname__
Sep 30 15:55:25 __server1-hostname__ consul: 2020-09-30T15:55:25.328+0200 [INFO]  agent.server.raft: added peer, starting replication: peer=7d5462ce-8b23-8b0f-a537-5c26da1818f9
Sep 30 15:55:25 __server1-hostname__ consul: 2020-09-30T15:55:25.328+0200 [INFO]  agent.server.raft: added peer, starting replication: peer=91c3145d-1879-9332-0e5c-c90736797ef3
Sep 30 15:55:25 __server1-hostname__ consul: 2020-09-30T15:55:25.330+0200 [INFO]  agent.server.raft: pipelining replication: peer="{Voter 7d5462ce-8b23-8b0f-a537-5c26da1818f9 [__server3-IPv6__]:8300}"
Sep 30 15:55:25 __server1-hostname__ consul: 2020-09-30T15:55:25.408+0200 [INFO]  agent.server.raft: pipelining replication: peer="{Voter 91c3145d-1879-9332-0e5c-c90736797ef3 [__server2-IPv6__]:8300}"
Sep 30 15:55:26 __server1-hostname__ consul: 2020-09-30T15:55:26.985+0200 [INFO]  agent.leader: started routine: routine="federation state anti-entropy"
Sep 30 15:55:26 __server1-hostname__ consul: 2020-09-30T15:55:26.985+0200 [INFO]  agent.leader: started routine: routine="federation state pruning"
Sep 30 15:55:26 __server1-hostname__ consul: 2020-09-30T15:55:26.985+0200 [INFO]  agent.leader: started routine: routine="CA root pruning"
Sep 30 15:55:26 __server1-hostname__ consul: 2020-09-30T15:55:26.985+0200 [INFO]  agent.server: member joined, marking health alive: member=__server1-hostname__
Sep 30 15:55:26 __server1-hostname__ consul: 2020-09-30T15:55:26.992+0200 [INFO]  agent.server: member joined, marking health alive: member=__server3-hostname__
Sep 30 15:55:26 __server1-hostname__ consul: 2020-09-30T15:55:26.995+0200 [INFO]  agent.server: federation state anti-entropy synced
Sep 30 15:55:27 __server1-hostname__ consul: 2020-09-30T15:55:27.001+0200 [INFO]  agent.server: member joined, marking health alive: member=__server2-hostname__
Sep 30 15:55:27 __server1-hostname__ consul: 2020-09-30T15:55:27.060+0200 [ERROR] agent.server.rpc: multiplex conn accept failed: conn=from=[__server3-IPv6__]:35754 error="read tcp [__server1-IPv6__]:8300->[__server3-IPv6__]:35754: read: connection reset by peer"
Sep 30 15:55:27 __server1-hostname__ consul: 2020-09-30T15:55:27.060+0200 [ERROR] agent.server.raft: failed to heartbeat to: peer=[__server3-IPv6__]:8300 error="read tcp [__server1-IPv6__]:42258->[__server3-IPv6__]:8300: read: connection reset by peer"
Sep 30 15:55:27 __server1-hostname__ consul: 2020-09-30T15:55:27.061+0200 [INFO]  agent.server.raft: aborting pipeline replication: peer="{Voter 7d5462ce-8b23-8b0f-a537-5c26da1818f9 [__server3-IPv6__]:8300}"
Sep 30 15:55:27 __server1-hostname__ consul: 2020-09-30T15:55:27.072+0200 [ERROR] agent.server.raft: failed to heartbeat to: peer=[__server3-IPv6__]:8300 error="dial tcp <nil>->[__server3-IPv6__]:8300: connect: connection refused"
Sep 30 15:55:27 __server1-hostname__ consul: 2020-09-30T15:55:27.109+0200 [INFO]  agent.server.raft: aborting pipeline replication: peer="{Voter 91c3145d-1879-9332-0e5c-c90736797ef3 [__server2-IPv6__]:8300}"
Sep 30 15:55:27 __server1-hostname__ consul: 2020-09-30T15:55:27.109+0200 [ERROR] agent.server.rpc: multiplex conn accept failed: conn=from=[__server2-IPv6__]:51250 error="read tcp [__server1-IPv6__]:8300->[__server2-IPv6__]:51250: read: connection reset by peer"
Sep 30 15:55:27 __server1-hostname__ consul: 2020-09-30T15:55:27.158+0200 [ERROR] agent.server.raft: failed to appendEntries to: peer="{Voter 7d5462ce-8b23-8b0f-a537-5c26da1818f9 [__server3-IPv6__]:8300}" error="dial tcp <nil>->[__server3-IPv6__]:8300: connect: connection refused"
Sep 30 15:55:27 __server1-hostname__ consul: 2020-09-30T15:55:27.167+0200 [ERROR] agent.server.raft: failed to appendEntries to: peer="{Voter 91c3145d-1879-9332-0e5c-c90736797ef3 [__server2-IPv6__]:8300}" error=EOF

Reverting to 1.7.2 causes panic:

Sep 30 16:01:04 __server1-hostname__ consul: 2020-09-30T16:01:04.106+0200 [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
Sep 30 16:01:05 __server1-hostname__ consul: 2020-09-30T16:01:05.024+0200 [WARN]  agent.server.raft: heartbeat timeout reached, starting election: last-leader=
Sep 30 16:01:05 __server1-hostname__ consul: 2020-09-30T16:01:05.024+0200 [INFO]  agent.server.raft: entering candidate state: node="Node at [__server1-IPv6__]:8300 [Candidate]" term=3196
Sep 30 16:01:05 __server1-hostname__ consul: 2020-09-30T16:01:05.083+0200 [INFO]  agent.server.raft: election won: tally=2
Sep 30 16:01:05 __server1-hostname__ consul: 2020-09-30T16:01:05.083+0200 [INFO]  agent.server.raft: entering leader state: leader="Node at [__server1-IPv6__]:8300 [Leader]"
Sep 30 16:01:05 __server1-hostname__ consul: 2020-09-30T16:01:05.083+0200 [INFO]  agent.server.raft: added peer, starting replication: peer=91c3145d-1879-9332-0e5c-c90736797ef3
Sep 30 16:01:05 __server1-hostname__ consul: 2020-09-30T16:01:05.083+0200 [INFO]  agent.server: cluster leadership acquired
Sep 30 16:01:05 __server1-hostname__ consul: 2020-09-30T16:01:05.083+0200 [INFO]  agent.server: New leader elected: payload=__server1-hostname__
Sep 30 16:01:05 __server1-hostname__ consul: 2020-09-30T16:01:05.092+0200 [INFO]  agent.server.raft: pipelining replication: peer="{Voter 91c3145d-1879-9332-0e5c-c90736797ef3 [__server2-IPv6__]:8300}"
Sep 30 16:01:06 __server1-hostname__ consul: panic: failed to apply request: []byte{0x1e, 0x84, 0xaa, 0x44, 0x61, 0x74, 0x61, 0x63, 0x65, 0x6e, 0x74, 0x65, 0x72, 0xa0, 0xa2, 0x4f, 0x70, 0xa6, 0x75, 0x70, 0x73, 0x65, 0x72, 0x74, 0xa5, 0x53, 0x74, 0x61, 0x74, 0x65, 0x86, 0xab, 0x43, 0x72, 0x65, 0x61, 0x74, 0x65, 0x49, 0x6e, 0x64, 0x65, 0x78, 0x0, 0xaa, 0x44, 0x61, 0x74, 0x61, 0x63, 0x65, 0x6e, 0x74, 0x65, 0x72, 0xa6, 0x6d, 0x65, 0x79, 0x72, 0x69, 0x6e, 0xac, 0x4d, 0x65, 0x73, 0x68, 0x47, 0x61, 0x74, 0x65, 0x77, 0x61, 0x79, 0x73, 0xc0, 0xab, 0x4d, 0x6f, 0x64, 0x69, 0x66, 0x79, 0x49, 0x6e, 0x64, 0x65, 0x78, 0x0, 0xb2, 0x50, 0x72, 0x69, 0x6d, 0x61, 0x72, 0x79, 0x4d, 0x6f, 0x64, 0x69, 0x66, 0x79, 0x49, 0x6e, 0x64, 0x65, 0x78, 0x0, 0xa9, 0x55, 0x70, 0x64, 0x61, 0x74, 0x65, 0x64, 0x41, 0x74, 0xaf, 0x1, 0x0, 0x0, 0x0, 0xe, 0xd7, 0x6, 0x85, 0x4e, 0x3b, 0x1e, 0x11, 0x87, 0xff, 0xff, 0xa5, 0x54, 0x6f, 0x6b, 0x65, 0x6e, 0xa0}
Sep 30 16:01:06 __server1-hostname__ consul: goroutine 49 [running]:
Sep 30 16:01:06 __server1-hostname__ consul: github.com/hashicorp/consul/agent/consul/fsm.(*FSM).Apply(0xc0007a5c20, 0xc001320370, 0xc000af48a0, 0x0)
Sep 30 16:01:06 __server1-hostname__ consul: /home/circleci/project/consul/agent/consul/fsm/fsm.go:129 +0x27d
Sep 30 16:01:06 __server1-hostname__ consul: github.com/hashicorp/go-raftchunking.(*ChunkingFSM).Apply(0xc0007f0600, 0xc001320370, 0x50b81a0, 0xbfd541c8a052adb7)
Sep 30 16:01:06 __server1-hostname__ consul: /go/pkg/mod/github.com/hashicorp/go-raftchunking@v0.6.1/fsm.go:66 +0x5b
Sep 30 16:01:06 __server1-hostname__ consul: github.com/hashicorp/raft.(*Raft).runFSM.func1(0xc0007349b0)
Sep 30 16:01:06 __server1-hostname__ consul: /go/pkg/mod/github.com/hashicorp/raft@v1.1.2/fsm.go:90 +0x2aa
Sep 30 16:01:06 __server1-hostname__ consul: github.com/hashicorp/raft.(*Raft).runFSM.func2(0xc001e54a00, 0x40, 0x40)
Sep 30 16:01:06 __server1-hostname__ consul: /go/pkg/mod/github.com/hashicorp/raft@v1.1.2/fsm.go:113 +0x75
Sep 30 16:01:06 __server1-hostname__ consul: github.com/hashicorp/raft.(*Raft).runFSM(0xc00010e000)
Sep 30 16:01:06 __server1-hostname__ consul: /go/pkg/mod/github.com/hashicorp/raft@v1.1.2/fsm.go:219 +0x3a9
Sep 30 16:01:06 __server1-hostname__ consul: github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc00010e000, 0xc0005d48f0)
Sep 30 16:01:06 __server1-hostname__ consul: /go/pkg/mod/github.com/hashicorp/raft@v1.1.2/state.go:146 +0x5d
Sep 30 16:01:06 __server1-hostname__ consul: created by github.com/hashicorp/raft.(*raftState).goFunc
Sep 30 16:01:06 __server1-hostname__ consul: /go/pkg/mod/github.com/hashicorp/raft@v1.1.2/state.go:144 +0x66
t0klian commented 4 years ago

We have been experiencing the same issue with 1.7.3->1.8.3 upgrade when doing it in rolling manner

djackyn commented 4 years ago

Facing same issue while upgrading 1.7.3->1.8.3. We are unable to reproduce this issue on dev environments.

lucasvieirazup commented 3 years ago

I have the same problem, no one can help??

lucasvieirazup commented 3 years ago

up