hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.31k stars 4.42k forks source link

Consul binary is crashing when update from v1.3.0 to v1.5.2 #6336

Open arun0704 opened 5 years ago

arun0704 commented 5 years ago

Overview of the Issue

Consul binary is crashing when update from v1.3.0 to v1.5.2

Reproduction Steps

i really don't understand how problem occured, it was running and working fine from last 26 days in test environment, after that it started giving me issue:- " [root@bcmt1903-panch-control-02 ~]# kubectl logs ztsservicediscoveryserver-0

bootstrap_expect > 0: expecting 3 servers

==> Starting Consul agent...

==> Error starting agent: Failed to start Consul server: Failed to start Raft: failed to load any existing snapshots" so i thought to update consul image .One thing i would say we do some KV r/w task in consul, are this can cause some problem like this?

Operating system and Environment details

consul latest docker image using "docker pull consul:latest"

Log Fragments

ztsservicediscoveryserver-0 0/1 CrashLoopBackOff 3 104s

ztsservicediscoveryserver-1 0/1 CrashLoopBackOff 28 120m

ztsservicediscoveryserver-2 0/1 CrashLoopBackOff 165 13h

[root@bcmt1903-panch-control-02 ~]# kubectl logs ztsservicediscoveryserver-0

bootstrap_expect > 0: expecting 3 servers

==> Starting Consul agent...

       Version: 'v1.5.2'

       Node ID: '35037005-0f61-b084-025d-f150cd4803df'

     Node name: 'ztsservicediscoveryserver-0'

    Datacenter: 'dc1' (Segment: '<all>')

        Server: true (Bootstrap: false)

   Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)

  Cluster Addr: 192.168.12.159 (LAN: 8301, WAN: 8302)

       Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false

==> Log data will now stream in as it occurs:

2019/08/16 07:22:46 [DEBUG] tlsutil: Update with version 1

2019/08/16 07:22:46 [DEBUG] tlsutil: OutgoingRPCWrapper with version 1

2019/08/16 07:22:46 [INFO]  raft: Restored from snapshot 145-1016371-1565779380540

2019/08/16 07:22:46 [ERROR] raft: Failed to get log at 1016372: log not found

panic: log not found

goroutine 1 [running]:

github.com/hashicorp/raft.NewRaft(0xc0004c0000, 0x32d4340, 0xc000726c60, 0x330ad80, 0xc00043c840, 0x32ee680, 0xc000366c40, 0x32d6000, 0xc000548000, 0x3326700, ...)

            /go/pkg/mod/github.com/hashicorp/raft@v1.1.0/api.go:512 +0x1459

github.com/hashicorp/consul/agent/consul.(*Server).setupRaft(0xc0004b0700, 0x0, 0x0)

            /consul/agent/consul/server.go:727 +0x5f5

github.com/hashicorp/consul/agent/consul.NewServerLogger(0xc0004b0380, 0xc000315860, 0xc0004f6080, 0xc000102050, 0x0, 0x0, 0x0)

            /consul/agent/consul/server.go:427 +0xafe

github.com/hashicorp/consul/agent.(*Agent).Start(0xc00011a480, 0x0, 0x0)

            /consul/agent/agent.go:411 +0x56e

github.com/hashicorp/consul/command/agent.(*cmd).run(0xc000176000, 0xc0000fe020, 0x10, 0x10, 0x0)

            /consul/command/agent/agent.go:279 +0xf59

github.com/hashicorp/consul/command/agent.(*cmd).Run(0xc000176000, 0xc0000fe020, 0x10, 0x10, 0xc0001b5b40)

            /consul/command/agent/agent.go:75 +0x4d

github.com/mitchellh/cli.(*CLI).Run(0xc000258140, 0xc000258140, 0x80, 0xc0000ce840)

            /go/pkg/mod/github.com/mitchellh/cli@v1.0.0/cli.go:255 +0x1f1

main.realMain(0xc0000c2058)

            /consul/main.go:53 +0x393

main.main()

            /consul/main.go:20 +0x22
mkeeler commented 5 years ago

For anyone else who may look the panic is coming from here: https://github.com/hashicorp/raft/blob/ed6f22234e826c00c7e6e469c0acfc31a3d3c595/api.go#L507-L515