hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.45k stars 4.43k forks source link

invalid memory address or nil pointer dereference #9566

Closed bert2002 closed 3 years ago

bert2002 commented 3 years ago

When filing a bug, please include the following headings if possible. Any example text in this template can be deleted.

Overview of the Issue

I want to upgrade from 1.8.6 to 1.9.1 and running into this nil pointer dereference.

Consul info for both Client and Server

OS: Debian 10 Consul: 1.9.1

Log Fragments

Jan 14 07:51:07 node-3 systemd[1]: Starting "HashiCorp Consul - A service mesh solution"...
Jan 14 07:51:08 node-3 consul[20951]: ==> Starting Consul agent...
Jan 14 07:51:08 node-3 consul[20951]:            Version: '1.9.1'
Jan 14 07:51:08 node-3 consul[20951]:            Node ID: '67bd22f1-b02d-ceaa-121a-e1b4c3c546d6'
Jan 14 07:51:08 node-3 consul[20951]:          Node name: 'node-3'
Jan 14 07:51:08 node-3 consul[20951]:         Datacenter: 'node' (Segment: '<all>')
Jan 14 07:51:08 node-3 consul[20951]:             Server: true (Bootstrap: false)
Jan 14 07:51:08 node-3 consul[20951]:        Client Addr: [127.0.0.1] (HTTP: -1, HTTPS: 8501, gRPC: 8502, DNS: 53)
Jan 14 07:51:08 node-3 consul[20951]:       Cluster Addr: XX.XXX.XXX.22 (LAN: 8301, WAN: 8302)
Jan 14 07:51:08 node-3 consul[20951]:            Encrypt: Gossip: true, TLS-Outgoing: true, TLS-Incoming: false, Auto-Encrypt-TLS: false
Jan 14 07:51:08 node-3 consul[20951]: ==> Log data will now stream in as it occurs:
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.022Z [WARN]  agent: skipping file /etc/consul.d/consul-agent-ca-key.pem, extension must be .hcl or .json, or config format must be set
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.022Z [WARN]  agent: skipping file /etc/consul.d/consul-agent-ca.pem, extension must be .hcl or .json, or config format must be set
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.022Z [WARN]  agent: skipping file /etc/consul.d/node-server-consul-0-key.pem, extension must be .hcl or .json, or config format must be set
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.022Z [WARN]  agent: skipping file /etc/consul.d/node-server-consul-0.pem, extension must be .hcl or .json, or config format must be set
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.022Z [WARN]  agent: The 'ui' field is deprecated. Use the 'ui_config.enabled' field instead.
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.022Z [WARN]  agent: bootstrap_expect > 0: expecting 3 servers
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.041Z [WARN]  agent.auto_config: skipping file /etc/consul.d/consul-agent-ca-key.pem, extension must be .hcl or .json, or config format must be set
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.041Z [WARN]  agent.auto_config: skipping file /etc/consul.d/consul-agent-ca.pem, extension must be .hcl or .json, or config format must be set
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.041Z [WARN]  agent.auto_config: skipping file /etc/consul.d/node-server-consul-0-key.pem, extension must be .hcl or .json, or config format must be set
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.041Z [WARN]  agent.auto_config: skipping file /etc/consul.d/node-server-consul-0.pem, extension must be .hcl or .json, or config format must be set
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.041Z [WARN]  agent.auto_config: The 'ui' field is deprecated. Use the 'ui_config.enabled' field instead.
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.041Z [WARN]  agent.auto_config: bootstrap_expect > 0: expecting 3 servers
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.054Z [INFO]  agent.server.raft: restored from snapshot: id=192-10076098-1610608541400
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.057Z [INFO]  agent.server.raft: initial configuration: index=10076757 servers="[{Suffrage:Voter ID:22559bf4-d913-cf98-6c2a-c6f0ff4d2f7d Address:XX.XXX.XXX.21:8300} {Suffrage:Voter ID:67bd22f1-b02d-ceaa-121a-e1b4c3c546d6 Address:XX.XXX.XXX.22:8300}]"
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.057Z [INFO]  agent.server.raft: entering follower state: follower="Node at XX.XXX.XXX.22:8300 [Follower]" leader=
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.058Z [INFO]  agent.server.serf.wan: serf: EventMemberJoin: node-3.node XX.XXX.XXX.22
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.058Z [INFO]  agent.server.serf.wan: serf: Attempting re-join to previously known node: node-1.node: XX.XXX.XXX.20:8302
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.058Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: node-3 XX.XXX.XXX.22
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.058Z [INFO]  agent.router: Initializing LAN area manager
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.058Z [INFO]  agent.server.serf.lan: serf: Attempting re-join to previously known node: node-1: XX.XXX.XXX.20:8301
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.058Z [INFO]  agent.server: Adding LAN server: server="node-3 (Addr: tcp/XX.XXX.XXX.22:8300) (DC: node)"
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.058Z [INFO]  agent.server: Raft data found, disabling bootstrap mode
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.058Z [INFO]  agent.server: Handled event for server in area: event=member-join server=node-3.node area=wan
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.059Z [WARN]  agent.server.memberlist.lan: memberlist: Refuting an alive message for 'node-3' (XX.XXX.XXX.22:8301) meta:([255 222 0 16 162 100 99 163 117 97 116 166 101 120 112 101 99 116 161 51 173 119 97 110 95 106 111 105 110 95 112 111 114 116 164 56 51 48 50 163 118 115 110 161 50 167 118 115 110 95 109 97 120 161 51 164 112 111 114 116 164 56 51 48 48 167 115 101 103 109 101 110 116 160 167 118 115 110 95 109 105 110 161 50 165 98 117 105 108 100 174 49 46 57 46 49 58 99 97 53 99 51 56 57 52 165 102 116 95 102 115 161 49 165 102 116 95 115 105 161 49 164 114 111 108 101 166 99 111 110 115 117 108 162 105 100 218 0 36 54 55 98 100 50 50 102 49 45 98 48 50 100 45 99 101 97 97 45 49 50 49 97 45 101 49 98 52 99 51 99 53 52 54 100 54 168 114 97 102 116 95 118 115 110 161 51 167 117 115 101 95 116 108 115 161 49 164 97 99 108 115 161 48] VS [255 222 0 16 167 118 115 110 95 109 97 120 161 51 164 112 111 114 116 164 56 51 48 48 167 117 115 101 95 116 108 115 161 49 164 97 99 108 115 161 48 165 102 116 95 102 115 161 49 167 115 101 103 109 101 110 116 160 163 118 115 110 161 50 162 100 99 163 117 97 116 166 101 120 112 101 99 116 161 51 173 119 97 110 95 106 111 105 110 95 112 111 114 116 164 56 51 48 50 165 98 117 105 108 100 174 49 46 57 46 49 58 99 97 53 99 51 56 57 52 167 118 115 110 95 109 105 110 161 50 168 114 97 102 116 95 118 115 110 161 51 165 102 116 95 115 105 161 49 164 114 111 108 101 166 99 111 110 115 117 108 162 105 100 218 0 36 54 55 98 100 50 50 102 49 45 98 48 50 100 45 99 101 97 97 45 49 50 49 97 45 101 49 98 52 99 51 99 53 52 54 100 54]), vsn:([1 5 2 2 5 4] VS [1 5 2 2 5 4])
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.059Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: node-2 XX.XXX.XXX.21
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.059Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: node-1 XX.XXX.XXX.20
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.059Z [INFO]  agent.server.serf.lan: serf: Re-joined to previously known node: node-1: XX.XXX.XXX.20:8301
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.060Z [INFO]  agent.server: Adding LAN server: server="node-2 (Addr: tcp/XX.XXX.XXX.21:8300) (DC: node)"
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.060Z [INFO]  agent.server: Adding LAN server: server="node-1 (Addr: tcp/XX.XXX.XXX.20:8300) (DC: node)"
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.060Z [INFO]  agent: Started DNS server: address=XX.XXX.XXX.22:53 network=udp
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.060Z [INFO]  agent: Started DNS server: address=XX.XXX.XXX.22:53 network=tcp
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.060Z [INFO]  agent: Starting server: address=XX.XXX.XXX.22:8501 network=tcp protocol=https
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.060Z [WARN]  agent: DEPRECATED Backwards compatibility with pre-1.9 metrics enabled. These metrics will be removed in a future version of Consul. Set `telemetry { disable_compat_1.9 = true }` to disable them.
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.060Z [INFO]  agent: Started gRPC server: address=XX.XXX.XXX.22:8502 network=tcp
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.060Z [INFO]  agent: Retry join is supported for the following discovery methods: cluster=LAN discovery_methods="aliyun aws azure digitalocean gce k8s linode mdns os packet scaleway softlayer tencentcloud triton vsphere"
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.060Z [INFO]  agent: Joining cluster...: cluster=LAN
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.060Z [INFO]  agent: (LAN) joining: lan_addresses=[XX.XXX.XXX.20, XX.XXX.XXX.21]
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.061Z [INFO]  agent: started state syncer
Jan 14 07:51:08 node-3 consul[20951]: ==> Consul agent running!
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.062Z [INFO]  agent: (LAN) joined: number_of_nodes=2
Jan 14 07:51:08 node-3 consul[20951]:     2021-01-14T07:51:08.062Z [INFO]  agent: Join cluster completed. Synced with initial agents: cluster=LAN num_agents=2
Jan 14 07:51:08 node-3 systemd[1]: Started "HashiCorp Consul - A service mesh solution".
Jan 14 07:51:09 node-3 consul[20951]:     2021-01-14T07:51:09.278Z [WARN]  agent.server.raft: heartbeat timeout reached, starting election: last-leader=
Jan 14 07:51:09 node-3 consul[20951]:     2021-01-14T07:51:09.278Z [INFO]  agent.server.raft: entering candidate state: node="Node at XX.XXX.XXX.22:8300 [Candidate]" term=340
Jan 14 07:51:09 node-3 consul[20951]:     2021-01-14T07:51:09.287Z [INFO]  agent.server.raft: election won: tally=2
Jan 14 07:51:09 node-3 consul[20951]:     2021-01-14T07:51:09.287Z [INFO]  agent.server.raft: entering leader state: leader="Node at XX.XXX.XXX.22:8300 [Leader]"
Jan 14 07:51:09 node-3 consul[20951]:     2021-01-14T07:51:09.287Z [INFO]  agent.server.raft: added peer, starting replication: peer=22559bf4-d913-cf98-6c2a-c6f0ff4d2f7d
Jan 14 07:51:09 node-3 consul[20951]:     2021-01-14T07:51:09.287Z [INFO]  agent.server: cluster leadership acquired
Jan 14 07:51:09 node-3 consul[20951]:     2021-01-14T07:51:09.287Z [INFO]  agent.server: New leader elected: payload=node-3
Jan 14 07:51:09 node-3 consul[20951]:     2021-01-14T07:51:09.288Z [INFO]  agent.server.raft: pipelining replication: peer="{Voter 22559bf4-d913-cf98-6c2a-c6f0ff4d2f7d XX.XXX.XXX.21:8300}"
Jan 14 07:51:09 node-3 consul[20951]: panic: runtime error: invalid memory address or nil pointer dereference
Jan 14 07:51:09 node-3 consul[20951]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x7ad532]
Jan 14 07:51:09 node-3 consul[20951]: goroutine 27 [running]:
Jan 14 07:51:09 node-3 consul[20951]: github.com/hashicorp/go-immutable-radix.(*Iterator).Next(0xc000d416c0, 0xc000d2eca0, 0x0, 0x0, 0xc0006e55b0, 0x0, 0xffffffffffffffff)
Jan 14 07:51:09 node-3 consul[20951]:         /go/pkg/mod/github.com/hashicorp/go-immutable-radix@v1.3.0/iter.go:178 +0xb2
Jan 14 07:51:09 node-3 consul[20951]: github.com/hashicorp/go-memdb.(*radixIterator).Next(0xc0006e5310, 0xc000d1a400, 0x99bff5)
Jan 14 07:51:09 node-3 consul[20951]:         /go/pkg/mod/github.com/hashicorp/go-memdb@v1.3.0/txn.go:895 +0x2e
Jan 14 07:51:09 node-3 consul[20951]: github.com/hashicorp/consul/agent/consul/state.cleanupMeshTopology(0x38f5800, 0xc000d1a400, 0x99bff5, 0xc00076c900, 0x99bff5, 0xc00076ca70)
Jan 14 07:51:09 node-3 consul[20951]:         /home/circleci/project/consul/agent/consul/state/catalog.go:3271 +0x36c
Jan 14 07:51:09 node-3 consul[20951]: github.com/hashicorp/consul/agent/consul/state.(*Store).deleteServiceTxn(0xc0005f19e0, 0x38f5800, 0xc000d1a400, 0x99bff5, 0xc0007cdc70, 0x5, 0xc00055dc00, 0x6d, 0xc00076ca70, 0x0, ...)
Jan 14 07:51:09 node-3 consul[20951]:         /home/circleci/project/consul/agent/consul/state/catalog.go:1542 +0x8c5
Jan 14 07:51:09 node-3 consul[20951]: github.com/hashicorp/consul/agent/consul/state.(*Store).deleteNodeTxn(0xc0005f19e0, 0x38f5800, 0xc000d1a400, 0x99bff5, 0xc0007cdc70, 0x5, 0xb25ddc, 0xc00073d8c0)
Jan 14 07:51:09 node-3 consul[20951]:         /home/circleci/project/consul/agent/consul/state/catalog.go:715 +0x62d
Jan 14 07:51:09 node-3 consul[20951]: github.com/hashicorp/consul/agent/consul/state.(*Store).DeleteNode(0xc0005f19e0, 0x99bff5, 0xc0007cdc70, 0x5, 0x0, 0x0)
Jan 14 07:51:09 node-3 consul[20951]:         /home/circleci/project/consul/agent/consul/state/catalog.go:648 +0xbb
Jan 14 07:51:09 node-3 consul[20951]: github.com/hashicorp/consul/agent/consul/fsm.(*FSM).applyDeregister(0xc000985bc0, 0xc0006d5b01, 0x36, 0x36, 0x99bff5, 0x0, 0x0)
Jan 14 07:51:09 node-3 consul[20951]:         /home/circleci/project/consul/agent/consul/fsm/commands_oss.go:171 +0x41a
Jan 14 07:51:09 node-3 consul[20951]: github.com/hashicorp/consul/agent/consul/fsm.NewFromDeps.func1(0xc0006d5b01, 0x36, 0x36, 0x99bff5, 0xc0006f40e0, 0xc000d07d00)
Jan 14 07:51:09 node-3 consul[20951]:         /home/circleci/project/consul/agent/consul/fsm/fsm.go:99 +0x56
Jan 14 07:51:09 node-3 consul[20951]: github.com/hashicorp/consul/agent/consul/fsm.(*FSM).Apply(0xc000985bc0, 0xc0003113b0, 0x0, 0x0)
Jan 14 07:51:09 node-3 consul[20951]:         /home/circleci/project/consul/agent/consul/fsm/fsm.go:133 +0x1b6
Jan 14 07:51:09 node-3 consul[20951]: github.com/hashicorp/go-raftchunking.(*ChunkingFSM).Apply(0xc0005a6390, 0xc0003113b0, 0x5191aa0, 0xbff81bdb51ca58fb)
Jan 14 07:51:09 node-3 consul[20951]:         /go/pkg/mod/github.com/hashicorp/go-raftchunking@v0.6.1/fsm.go:66 +0x5b
Jan 14 07:51:09 node-3 consul[20951]: github.com/hashicorp/raft.(*Raft).runFSM.func1(0xc00007ddf0)
Jan 14 07:51:09 node-3 consul[20951]:         /go/pkg/mod/github.com/hashicorp/raft@v1.2.0/fsm.go:90 +0x2c2
Jan 14 07:51:09 node-3 consul[20951]: github.com/hashicorp/raft.(*Raft).runFSM.func2(0xc000190200, 0x40, 0x40)
Jan 14 07:51:09 node-3 consul[20951]:         /go/pkg/mod/github.com/hashicorp/raft@v1.2.0/fsm.go:113 +0x75
Jan 14 07:51:09 node-3 consul[20951]: github.com/hashicorp/raft.(*Raft).runFSM(0xc0002f6f00)
Jan 14 07:51:09 node-3 consul[20951]:         /go/pkg/mod/github.com/hashicorp/raft@v1.2.0/fsm.go:219 +0x3c4
Jan 14 07:51:09 node-3 consul[20951]: github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc0002f6f00, 0xc0006feb80)
Jan 14 07:51:09 node-3 consul[20951]:         /go/pkg/mod/github.com/hashicorp/raft@v1.2.0/state.go:146 +0x55
Jan 14 07:51:09 node-3 consul[20951]: created by github.com/hashicorp/raft.(*raftState).goFunc
Jan 14 07:51:09 node-3 consul[20951]:         /go/pkg/mod/github.com/hashicorp/raft@v1.2.0/state.go:144 +0x66
Jan 14 07:51:09 node-3 systemd[1]: consul.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Jan 14 07:51:09 node-3 systemd[1]: consul.service: Failed with result 'exit-code'.
Jan 14 07:51:09 node-3 systemd[1]: consul.service: Service RestartSec=100ms expired, scheduling restart.
Jan 14 07:51:09 node-3 systemd[1]: consul.service: Scheduled restart job, restart counter is at 36.
Jan 14 07:51:09 node-3 systemd[1]: Stopped "HashiCorp Consul - A service mesh solution".

Any idea on which data it freaks out?

Cheers, bert

dnephin commented 3 years ago

Thank you for the bug report!

We had another report of this in #9482. I've closed that issue so we can track it here. It sounds like this bug exists in 1.9.0 as well.

We don't have much of a lead on this yet, we'll need to do some more investigation.

Can you tell me more about how you use Consul (ex: for kv, service discovery and/or, connect service mesh) ? Do you know if you might have multiple service instances with the same name on a single node? (that shouldn't be a problem, but I wonder if it might be a factor in triggering this bug).

adrien-f commented 3 years ago

Greetings 👋 ! We've been experiencing the panic errors over the last couple days, we upgraded from 1.8.4 to 1.9.1 and our cluster has been crashing a few times already:

Jan 17 18:37:39 consul-server-node consul[18915]: panic: runtime error: invalid memory address or nil pointer dereference
Jan 17 18:37:39 consul-server-node consul[18915]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x7ad532]
Jan 17 18:37:39 consul-server-node consul[18915]: goroutine 36 [running]:
Jan 17 18:37:39 consul-server-node consul[18915]: github.com/hashicorp/go-immutable-radix.(*Iterator).Next(0xc001b91120, 0x0, 0xc001b91240, 0x0, 0xc0013b2c00, 0x0, 0xffffffffffffffff)
Jan 17 18:37:39 consul-server-node consul[18915]:         /go/pkg/mod/github.com/hashicorp/go-immutable-radix@v1.3.0/iter.go:178 +0xb2
Jan 17 18:37:39 consul-server-node consul[18915]: github.com/hashicorp/go-memdb.(*radixIterator).Next(0xc0013b2be0, 0xc001b51260, 0x5944f)
Jan 17 18:37:39 consul-server-node consul[18915]:         /go/pkg/mod/github.com/hashicorp/go-memdb@v1.3.0/txn.go:895 +0x2e
Jan 17 18:37:39 consul-server-node consul[18915]: github.com/hashicorp/consul/agent/consul/state.cleanupGatewayWildcards(0x38f5800, 0xc001b51260, 0x5944f, 0xc00141a300, 0x0, 0x0)
Jan 17 18:37:39 consul-server-node consul[18915]:         /home/circleci/project/consul/agent/consul/state/catalog.go:2783 +0xe8
Jan 17 18:37:39 consul-server-node consul[18915]: github.com/hashicorp/consul/agent/consul/state.(*Store).deleteServiceTxn(0xc00113a1b0, 0x38f5800, 0xc001b51260, 0x5944f, 0xc0020e5ce0, 0x10, 0xc001420500, 0x79, 0xc00141a470, 0x0, ...)
Jan 17 18:37:39 consul-server-node consul[18915]:         /home/circleci/project/consul/agent/consul/state/catalog.go:1565 +0xcb0
Jan 17 18:37:39 consul-server-node consul[18915]: github.com/hashicorp/consul/agent/consul/state.(*Store).deleteNodeTxn(0xc00113a1b0, 0x38f5800, 0xc001b51260, 0x5944f, 0xc0020e5ce0, 0x10, 0xb25ddc, 0xc0020cd500)
Jan 17 18:37:39 consul-server-node consul[18915]:         /home/circleci/project/consul/agent/consul/state/catalog.go:715 +0x62d
Jan 17 18:37:39 consul-server-node consul[18915]: github.com/hashicorp/consul/agent/consul/state.(*Store).DeleteNode(0xc00113a1b0, 0x5944f, 0xc0020e5ce0, 0x10, 0x0, 0x0)
Jan 17 18:37:39 consul-server-node consul[18915]:         /home/circleci/project/consul/agent/consul/state/catalog.go:648 +0xbb
Jan 17 18:37:39 consul-server-node consul[18915]: github.com/hashicorp/consul/agent/consul/fsm.(*FSM).applyDeregister(0xc00053c240, 0xc001e0c0a1, 0x4b, 0x4b, 0x5944f, 0x0, 0x0)
Jan 17 18:37:39 consul-server-node consul[18915]:         /home/circleci/project/consul/agent/consul/fsm/commands_oss.go:171 +0x41a
Jan 17 18:37:39 consul-server-node consul[18915]: github.com/hashicorp/consul/agent/consul/fsm.NewFromDeps.func1(0xc001e0c0a1, 0x4b, 0x4b, 0x5944f, 0xc00059e100, 0xc0020d96c0)
Jan 17 18:37:39 consul-server-node consul[18915]:         /home/circleci/project/consul/agent/consul/fsm/fsm.go:99 +0x56
Jan 17 18:37:39 consul-server-node consul[18915]: github.com/hashicorp/consul/agent/consul/fsm.(*FSM).Apply(0xc00053c240, 0xc00130bea0, 0x0, 0x0)
Jan 17 18:37:39 consul-server-node consul[18915]:         /home/circleci/project/consul/agent/consul/fsm/fsm.go:133 +0x1b6
Jan 17 18:37:39 consul-server-node consul[18915]: github.com/hashicorp/go-raftchunking.(*ChunkingFSM).Apply(0xc0010570b0, 0xc00130bea0, 0x5191aa0, 0xbff93b58c077682e)
Jan 17 18:37:39 consul-server-node consul[18915]:         /go/pkg/mod/github.com/hashicorp/go-raftchunking@v0.6.1/fsm.go:66 +0x5b
Jan 17 18:37:39 consul-server-node consul[18915]: github.com/hashicorp/raft.(*Raft).runFSM.func1(0xc001570320)
Jan 17 18:37:39 consul-server-node consul[18915]:         /go/pkg/mod/github.com/hashicorp/raft@v1.2.0/fsm.go:90 +0x2c2
Jan 17 18:37:39 consul-server-node consul[18915]: github.com/hashicorp/raft.(*Raft).runFSM.func2(0xc0015e5a00, 0x40, 0x40)
Jan 17 18:37:39 consul-server-node consul[18915]:         /go/pkg/mod/github.com/hashicorp/raft@v1.2.0/fsm.go:113 +0x75
Jan 17 18:37:39 consul-server-node consul[18915]: github.com/hashicorp/raft.(*Raft).runFSM(0xc0002f3500)
Jan 17 18:37:39 consul-server-node consul[18915]:         /go/pkg/mod/github.com/hashicorp/raft@v1.2.0/fsm.go:219 +0x3c4
Jan 17 18:37:39 consul-server-node consul[18915]: github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc0002f3500, 0xc00116a950)
Jan 17 18:37:39 consul-server-node consul[18915]:         /go/pkg/mod/github.com/hashicorp/raft@v1.2.0/state.go:146 +0x55
Jan 17 18:37:39 consul-server-node consul[18915]: created by github.com/hashicorp/raft.(*raftState).goFunc
Jan 17 18:37:39 consul-server-node consul[18915]:         /go/pkg/mod/github.com/hashicorp/raft@v1.2.0/state.go:144 +0x66
Jan 17 18:37:39 consul-server-node systemd[1]: consul.service: Main process exited, code=exited, status=2/INVALIDARGUMENT

We're using the cluster for a mix of SD and Mesh with around ~60 nodes.

Let me know if we can help you with more debugging information 🙏 ! Thanks a lot.

bert2002 commented 3 years ago

Can you tell me more about how you use Consul (ex: for kv, service discovery and/or, connect service mesh) ? Do you know if you might have multiple service instances with the same name on a single node? (that shouldn't be a problem, but I wonder if it might be a factor in triggering this bug).

Running in a three node cluster with connect service mesh, ACL disabled, TLS enabled and service discovery.

dnephin commented 3 years ago

Thank you everyone who has reported and provided information about this panic! We have identified the problem and have a couple patches to fix it. There should be a 1.9.2 release very soon which will include the fix.

Unfortunately we haven't found any workarounds yet. The bug is triggered when a node is deleted, but it is probably hard to avoid that. Any time an agent is restarted it will perform a node delete.