hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.41k stars 4.43k forks source link

consul can't start #318

Closed vgv closed 10 years ago

vgv commented 10 years ago

Hi

consul 0.3.1, ubuntu linux x64

Steps: 1) I have a test consul cluster (3 servers + 2 agents). All are running in 5 virtual machines (VirtualBox) 2) I put 1 key into KV store 3) Change this single key about 1.000.000 times 4) All working fine 5) After that I terminate all virtual machines hard. I did a sort of "power off" tests.

After that I start all my VMs again and try to start first server like this:

vgv@ds1:~$ consul/consul agent -server -data-dir=/home/vgv/consul/consul-dir -config-dir=/home/vgv/consul/config-dir -bind=192.168.200.51 -client=0.0.0.0 -ui-dir=/home/vgv/consul/ui -bootstrap
==> WARNING: Bootstrap mode enabled! Do not enable unless necessary ==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1 ==> Starting Consul agent... ==> Starting Consul agent RPC... ==> Consul agent running! Node name: 'ds1' Datacenter: 'dc1' Server: true (bootstrap: true) Client Addr: 0.0.0.0 (HTTP: 8500, DNS: 8600, RPC: 8400) Cluster Addr: 192.168.200.51 (LAN: 8301, WAN: 8302) Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false

==> Log data will now stream in as it occurs:

2014/09/04 18:14:28 [ERR] snapshot: CRC checksum failed (stored: [67 214 101 86 72 179 89 11] computed: [0 0 0 0 0 0 0 0])
2014/09/04 18:14:28 [ERR] raft: Failed to open snapshot 37-857234-2014-09-04T15:01:37.085495752+04:00: CRC mismatch
2014/09/04 18:14:28 [INFO] raft: Restored from snapshot 37-796820-2014-09-04T14:59:32.99103404+04:00
2014/09/04 18:14:28 [INFO] serf: EventMemberJoin: ds1 192.168.200.51
2014/09/04 18:14:28 [INFO] serf: EventMemberJoin: ds1.dc1 192.168.200.51
2014/09/04 18:14:28 [INFO] raft: Node at 192.168.200.51:8300 [Follower] entering Follower state
2014/09/04 18:14:28 [WARN] serf: Failed to re-join any previously known node
2014/09/04 18:14:28 [WARN] serf: Failed to re-join any previously known node
2014/09/04 18:14:28 [INFO] consul: adding server ds1 (Addr: 192.168.200.51:8300) (DC: dc1)
2014/09/04 18:14:28 [INFO] consul: adding server ds1.dc1 (Addr: 192.168.200.51:8300) (DC: dc1)
2014/09/04 18:14:28 [ERR] agent: failed to sync remote state: No cluster leader
2014/09/04 18:14:30 [WARN] raft: Heartbeat timeout reached, starting election
2014/09/04 18:14:30 [INFO] raft: Node at 192.168.200.51:8300 [Candidate] entering Candidate state
2014/09/04 18:14:30 [INFO] raft: Election won. Tally: 1
2014/09/04 18:14:30 [INFO] raft: Node at 192.168.200.51:8300 [Leader] entering Leader state
2014/09/04 18:14:30 [INFO] consul: cluster leadership acquired
2014/09/04 18:14:30 [ERR] raft: Failed to get log at 796821: log not found

panic: log not found

goroutine 28 [running]: runtime.panic(0x95bd60, 0xc208000dd0) /opt/go/src/pkg/runtime/panic.c:279 +0xf5 github.com/hashicorp/raft.(_Raft).processLogs(0xc208010380, 0xd1e83, 0xc208052500) /opt/gopath/src/github.com/hashicorp/raft/raft.go:1044 +0x350 github.com/hashicorp/raft.(_Raft).leaderLoop(0xc208010380) /opt/gopath/src/github.com/hashicorp/raft/raft.go:801 +0x365 github.com/hashicorp/raft.(_Raft).runLeader(0xc208010380) /opt/gopath/src/github.com/hashicorp/raft/raft.go:755 +0x58d github.com/hashicorp/raft.(_Raft).run(0xc208010380) /opt/gopath/src/github.com/hashicorp/raft/raft.go:560 +0xba github.com/hashicorp/raft._Raft.(github.com/hashicorp/raft.run)·fm() /opt/gopath/src/github.com/hashicorp/raft/raft.go:231 +0x26 github.com/hashicorp/raft.func·008() /opt/gopath/src/github.com/hashicorp/raft/state.go:152 +0x4e created by github.com/hashicorp/raft.(_raftState).goFunc /opt/gopath/src/github.com/hashicorp/raft/state.go:153 +0x9c

goroutine 16 [select]: github.com/hashicorp/consul/command/agent.(_Command).handleSignals(0xc20800e1b0, 0xc208055400, 0x2e) /opt/gopath/src/github.com/hashicorp/consul/command/agent/command.go:411 +0x853 github.com/hashicorp/consul/command/agent.(_Command).Run(0xc20800e1b0, 0xc20800e020, 0x7, 0x7, 0x0) /opt/gopath/src/github.com/hashicorp/consul/command/agent/command.go:400 +0x1646 github.com/mitchellh/cli.(*CLI).Run(0xc2080523c0, 0xc2080523c0, 0x0, 0x0) /opt/gopath/src/github.com/mitchellh/cli/cli.go:100 +0x3a1 main.realMain(0x411bbe) /opt/gopath/src/github.com/hashicorp/consul/main.go:37 +0x2fb main.main() /opt/gopath/src/github.com/hashicorp/consul/main.go:12 +0x1e

goroutine 19 [finalizer wait]: runtime.park(0x4245d0, 0xf95030, 0xf814c9) /opt/go/src/pkg/runtime/proc.c:1369 +0x89 runtime.parkunlock(0xf95030, 0xf814c9) /opt/go/src/pkg/runtime/proc.c:1385 +0x3b runfinq() /opt/go/src/pkg/runtime/mgc0.c:2644 +0xcf runtime.goexit() /opt/go/src/pkg/runtime/proc.c:1445

goroutine 20 [syscall]: os/signal.loop() /opt/go/src/pkg/os/signal/signal_unix.go:21 +0x1e created by os/signal.init·1 /opt/go/src/pkg/os/signal/signal_unix.go:27 +0x32

goroutine 23 [select]: github.com/armon/go-metrics.(*InmemSignal).run(0xc208049840) /opt/gopath/src/github.com/armon/go-metrics/inmem_signal.go:63 +0xa8 created by github.com/armon/go-metrics.NewInmemSignal /opt/gopath/src/github.com/armon/go-metrics/inmem_signal.go:37 +0x168

goroutine 24 [sleep]: time.Sleep(0x3b9aca00) /opt/go/src/pkg/runtime/time.goc:39 +0x31 github.com/armon/go-metrics.(*Metrics).collectStats(0xc20801b450) /opt/gopath/src/github.com/armon/go-metrics/metrics.go:67 +0x2b created by github.com/armon/go-metrics.New /opt/gopath/src/github.com/armon/go-metrics/start.go:61 +0x9b

goroutine 25 [select]: github.com/hashicorp/consul/consul.(*ConnPool).reap(0xc20801b540) /opt/gopath/src/github.com/hashicorp/consul/consul/pool.go:360 +0x48a created by github.com/hashicorp/consul/consul.NewPool /opt/gopath/src/github.com/hashicorp/consul/consul/pool.go:154 +0xda

goroutine 17 [syscall]: runtime.goexit() /opt/go/src/pkg/runtime/proc.c:1445

goroutine 27 [select]: github.com/hashicorp/consul/consul.(_RaftLayer).Accept(0xc208049900, 0x0, 0x0, 0x0, 0x0) /opt/gopath/src/github.com/hashicorp/consul/consul/raft_rpc.go:56 +0x14d github.com/hashicorp/raft.(_NetworkTransport).listen(0xc208004960) /opt/gopath/src/github.com/hashicorp/raft/net_transport.go:339 +0x4f created by github.com/hashicorp/raft.NewNetworkTransport /opt/gopath/src/github.com/hashicorp/raft/net_transport.go:135 +0x1c9

goroutine 29 [select]: github.com/hashicorp/raft.(_Raft).runFSM(0xc208010380) /opt/gopath/src/github.com/hashicorp/raft/raft.go:474 +0xdbb github.com/hashicorp/raft._Raft.(github.com/hashicorp/raft.runFSM)·fm() /opt/gopath/src/github.com/hashicorp/raft/raft.go:232 +0x26 github.com/hashicorp/raft.func·008() /opt/gopath/src/github.com/hashicorp/raft/state.go:152 +0x4e created by github.com/hashicorp/raft.(*raftState).goFunc /opt/gopath/src/github.com/hashicorp/raft/state.go:153 +0x9c

goroutine 30 [select]: github.com/hashicorp/raft.(_Raft).runSnapshots(0xc208010380) /opt/gopath/src/github.com/hashicorp/raft/raft.go:1525 +0x3bd github.com/hashicorp/raft._Raft.(github.com/hashicorp/raft.runSnapshots)·fm() /opt/gopath/src/github.com/hashicorp/raft/raft.go:233 +0x26 github.com/hashicorp/raft.func·008() /opt/gopath/src/github.com/hashicorp/raft/state.go:152 +0x4e created by github.com/hashicorp/raft.(*raftState).goFunc /opt/gopath/src/github.com/hashicorp/raft/state.go:153 +0x9c

goroutine 31 [runnable]: github.com/hashicorp/consul/consul.(_Server).monitorLeadership(0xc20804e640) /opt/gopath/src/github.com/hashicorp/consul/consul/leader.go:31 +0x19d created by github.com/hashicorp/consul/consul.(_Server).setupRaft /opt/gopath/src/github.com/hashicorp/consul/consul/server.go:328 +0x908

goroutine 33 [runnable]: sync.runtime_Semacquire(0xc20801b4f4) /opt/go/src/pkg/runtime/sema.goc:199 +0x30 sync.(_Mutex).Lock(0xc20801b4f0) /opt/go/src/pkg/sync/mutex.go:66 +0xd6 log.(_Logger).Output(0xc20801b4f0, 0x2, 0xc2080f1860, 0x26, 0x0, 0x0) /opt/go/src/pkg/log/log.go:134 +0x9b log.(_Logger).Printf(0xc20801b4f0, 0xaefb70, 0x25, 0x7fac1f50fe88, 0x1, 0x1) /opt/go/src/pkg/log/log.go:160 +0x84 github.com/hashicorp/consul/consul.(_Server).localEvent(0xc20804e640, 0x2, 0xaa7f70, 0x11, 0xc2080e6da0, 0x3, 0x8, 0x0) /opt/gopath/src/github.com/hashicorp/consul/consul/serf.go:107 +0x1c9 github.com/hashicorp/consul/consul.(*Server).lanEventHandler(0xc20804e640) /opt/gopath/src/github.com/hashicorp/consul/consul/serf.go:35 +0x34e created by github.com/hashicorp/consul/consul.NewServer /opt/gopath/src/github.com/hashicorp/consul/consul/server.go:191 +0x70d

goroutine 34 [select]: github.com/hashicorp/consul/consul.(*Server).wanEventHandler(0xc20804e640) /opt/gopath/src/github.com/hashicorp/consul/consul/serf.go:51 +0x281 created by github.com/hashicorp/consul/consul.NewServer /opt/gopath/src/github.com/hashicorp/consul/consul/server.go:192 +0x725

goroutine 35 [select]: github.com/hashicorp/serf/serf.(*serfQueries).stream(0xc2080f8ba0) /opt/gopath/src/github.com/hashicorp/serf/serf/internal_query.go:80 +0x238 created by github.com/hashicorp/serf/serf.newSerfQueries /opt/gopath/src/github.com/hashicorp/serf/serf/internal_query.go:73 +0x9f

goroutine 36 [select]: github.com/hashicorp/serf/serf.(*Snapshotter).stream(0xc2080a9340) /opt/gopath/src/github.com/hashicorp/serf/serf/snapshot.go:174 +0x890 created by github.com/hashicorp/serf/serf.NewSnapshotter /opt/gopath/src/github.com/hashicorp/serf/serf/snapshot.go:122 +0x510

goroutine 37 [IO wait]: net.runtime_pollWait(0x7fac1f6a9e20, 0x72, 0x0) /opt/go/src/pkg/runtime/netpoll.goc:146 +0x66 net.(_pollDesc).Wait(0xc2080eeca0, 0x72, 0x0, 0x0) /opt/go/src/pkg/net/fd_poll_runtime.go:84 +0x46 net.(_pollDesc).WaitRead(0xc2080eeca0, 0x0, 0x0) /opt/go/src/pkg/net/fd_poll_runtime.go:89 +0x42 net.(_netFD).accept(0xc2080eec40, 0xb87538, 0x0, 0x7fac1f6a8440, 0xb) /opt/go/src/pkg/net/fd_unix.go:409 +0x343 net.(_TCPListener).AcceptTCP(0xc2080381a8, 0x0, 0x0, 0x0) /opt/go/src/pkg/net/tcpsock_posix.go:234 +0x5d github.com/hashicorp/memberlist.(*Memberlist).tcpListen(0xc208056700) /opt/gopath/src/github.com/hashicorp/memberlist/net.go:173 +0x2b created by github.com/hashicorp/memberlist.newMemberlist /opt/gopath/src/github.com/hashicorp/memberlist/memberlist.go:121 +0xa3b

goroutine 38 [IO wait]: net.runtime_pollWait(0x7fac1f6a9d70, 0x72, 0x0) /opt/go/src/pkg/runtime/netpoll.goc:146 +0x66 net.(_pollDesc).Wait(0xc2080eed10, 0x72, 0x0, 0x0) /opt/go/src/pkg/net/fd_poll_runtime.go:84 +0x46 net.(_pollDesc).WaitRead(0xc2080eed10, 0x0, 0x0) /opt/go/src/pkg/net/fd_poll_runtime.go:89 +0x42 net.(_netFD).readFrom(0xc2080eecb0, 0xc208126000, 0x10000, 0x10000, 0x0, 0x0, 0x0, 0x7fac1f6a8440, 0xb) /opt/go/src/pkg/net/fd_unix.go:259 +0x3db net.(_UDPConn).ReadFromUDP(0xc2080381b0, 0xc208126000, 0x10000, 0x10000, 0x436a2a, 0x0, 0x0, 0x0) /opt/go/src/pkg/net/udpsock_posix.go:67 +0x129 net.(_UDPConn).ReadFrom(0xc2080381b0, 0xc208126000, 0x10000, 0x10000, 0x10000, 0x0, 0x0, 0x0, 0x0) /opt/go/src/pkg/net/udpsock_posix.go:82 +0x142 github.com/hashicorp/memberlist.(_Memberlist).udpListen(0xc208056700) /opt/gopath/src/github.com/hashicorp/memberlist/net.go:235 +0x2a4 created by github.com/hashicorp/memberlist.newMemberlist /opt/gopath/src/github.com/hashicorp/memberlist/memberlist.go:122 +0xa59

goroutine 39 [select]: github.com/hashicorp/memberlist.(*Memberlist).udpHandler(0xc208056700) /opt/gopath/src/github.com/hashicorp/memberlist/net.go:320 +0x2d8 created by github.com/hashicorp/memberlist.newMemberlist /opt/gopath/src/github.com/hashicorp/memberlist/memberlist.go:123 +0xa77

goroutine 40 [select]: github.com/hashicorp/memberlist.(_Memberlist).triggerFunc(0xc208056700, 0x3b9aca00, 0xc2080ef180, 0xc208005440, 0xc2080f7be0) /opt/gopath/src/github.com/hashicorp/memberlist/state.go:104 +0x109 created by github.com/hashicorp/memberlist.(_Memberlist).schedule /opt/gopath/src/github.com/hashicorp/memberlist/state.go:70 +0x151

goroutine 41 [select]: github.com/hashicorp/memberlist.(_Memberlist).pushPullTrigger(0xc208056700, 0xc208005440) /opt/gopath/src/github.com/hashicorp/memberlist/state.go:122 +0x196 created by github.com/hashicorp/memberlist.(_Memberlist).schedule /opt/gopath/src/github.com/hashicorp/memberlist/state.go:76 +0x269

goroutine 42 [select]: github.com/hashicorp/memberlist.(_Memberlist).triggerFunc(0xc208056700, 0xbebc200, 0xc2080ef1f0, 0xc208005440, 0xc2080f7bf0) /opt/gopath/src/github.com/hashicorp/memberlist/state.go:104 +0x109 created by github.com/hashicorp/memberlist.(_Memberlist).schedule /opt/gopath/src/github.com/hashicorp/memberlist/state.go:82 +0x31a

goroutine 43 [select]: github.com/hashicorp/serf/serf.(*Serf).handleReap(0xc20802e4e0) /opt/gopath/src/github.com/hashicorp/serf/serf/serf.go:1333 +0x1b4 created by github.com/hashicorp/serf/serf.Create /opt/gopath/src/github.com/hashicorp/serf/serf/serf.go:364 +0x1289

goroutine 44 [select]: github.com/hashicorp/serf/serf.(*Serf).handleReconnect(0xc20802e4e0) /opt/gopath/src/github.com/hashicorp/serf/serf/serf.go:1349 +0xc7 created by github.com/hashicorp/serf/serf.Create /opt/gopath/src/github.com/hashicorp/serf/serf/serf.go:365 +0x12a7

goroutine 45 [select]: github.com/hashicorp/serf/serf.(*Serf).checkQueueDepth(0xc20802e4e0, 0xa43190, 0x6, 0xc2080f8fc0) /opt/gopath/src/github.com/hashicorp/serf/serf/serf.go:1439 +0x3f1 created by github.com/hashicorp/serf/serf.Create /opt/gopath/src/github.com/hashicorp/serf/serf/serf.go:366 +0x12e5

goroutine 46 [select]: github.com/hashicorp/serf/serf.(*Serf).checkQueueDepth(0xc20802e4e0, 0xa41870, 0x5, 0xc2080f8ff0) /opt/gopath/src/github.com/hashicorp/serf/serf/serf.go:1439 +0x3f1 created by github.com/hashicorp/serf/serf.Create /opt/gopath/src/github.com/hashicorp/serf/serf/serf.go:367 +0x1326

goroutine 47 [select]: github.com/hashicorp/serf/serf.(*Serf).checkQueueDepth(0xc20802e4e0, 0xa469d0, 0x5, 0xc2080f9020) /opt/gopath/src/github.com/hashicorp/serf/serf/serf.go:1439 +0x3f1 created by github.com/hashicorp/serf/serf.Create /opt/gopath/src/github.com/hashicorp/serf/serf/serf.go:368 +0x1367

goroutine 49 [select]: github.com/hashicorp/serf/serf.(*serfQueries).stream(0xc2080f9890) /opt/gopath/src/github.com/hashicorp/serf/serf/internal_query.go:80 +0x238 created by github.com/hashicorp/serf/serf.newSerfQueries /opt/gopath/src/github.com/hashicorp/serf/serf/internal_query.go:73 +0x9f

goroutine 50 [select]: github.com/hashicorp/serf/serf.(*Snapshotter).stream(0xc2080a93f0) /opt/gopath/src/github.com/hashicorp/serf/serf/snapshot.go:174 +0x890 created by github.com/hashicorp/serf/serf.NewSnapshotter /opt/gopath/src/github.com/hashicorp/serf/serf/snapshot.go:122 +0x510

goroutine 51 [IO wait]: net.runtime_pollWait(0x7fac1f6a9cc0, 0x72, 0x0) /opt/go/src/pkg/runtime/netpoll.goc:146 +0x66 net.(_pollDesc).Wait(0xc2080ef3a0, 0x72, 0x0, 0x0) /opt/go/src/pkg/net/fd_poll_runtime.go:84 +0x46 net.(_pollDesc).WaitRead(0xc2080ef3a0, 0x0, 0x0) /opt/go/src/pkg/net/fd_poll_runtime.go:89 +0x42 net.(_netFD).accept(0xc2080ef340, 0xb87538, 0x0, 0x7fac1f6a8440, 0xb) /opt/go/src/pkg/net/fd_unix.go:409 +0x343 net.(_TCPListener).AcceptTCP(0xc208038210, 0x0, 0x0, 0x0) /opt/go/src/pkg/net/tcpsock_posix.go:234 +0x5d github.com/hashicorp/memberlist.(*Memberlist).tcpListen(0xc2080568c0) /opt/gopath/src/github.com/hashicorp/memberlist/net.go:173 +0x2b created by github.com/hashicorp/memberlist.newMemberlist /opt/gopath/src/github.com/hashicorp/memberlist/memberlist.go:121 +0xa3b

goroutine 52 [IO wait]: net.runtime_pollWait(0x7fac1f6a9c10, 0x72, 0x0) /opt/go/src/pkg/runtime/netpoll.goc:146 +0x66 net.(_pollDesc).Wait(0xc2080ef410, 0x72, 0x0, 0x0) /opt/go/src/pkg/net/fd_poll_runtime.go:84 +0x46 net.(_pollDesc).WaitRead(0xc2080ef410, 0x0, 0x0) /opt/go/src/pkg/net/fd_poll_runtime.go:89 +0x42 net.(_netFD).readFrom(0xc2080ef3b0, 0xc208136000, 0x10000, 0x10000, 0x0, 0x0, 0x0, 0x7fac1f6a8440, 0xb) /opt/go/src/pkg/net/fd_unix.go:259 +0x3db net.(_UDPConn).ReadFromUDP(0xc208038218, 0xc208136000, 0x10000, 0x10000, 0x436a2a, 0x0, 0x0, 0x0) /opt/go/src/pkg/net/udpsock_posix.go:67 +0x129 net.(_UDPConn).ReadFrom(0xc208038218, 0xc208136000, 0x10000, 0x10000, 0x10000, 0x0, 0x0, 0x0, 0x0) /opt/go/src/pkg/net/udpsock_posix.go:82 +0x142 github.com/hashicorp/memberlist.(_Memberlist).udpListen(0xc2080568c0) /opt/gopath/src/github.com/hashicorp/memberlist/net.go:235 +0x2a4 created by github.com/hashicorp/memberlist.newMemberlist /opt/gopath/src/github.com/hashicorp/memberlist/memberlist.go:122 +0xa59

goroutine 53 [select]: github.com/hashicorp/memberlist.(*Memberlist).udpHandler(0xc2080568c0) /opt/gopath/src/github.com/hashicorp/memberlist/net.go:320 +0x2d8 created by github.com/hashicorp/memberlist.newMemberlist /opt/gopath/src/github.com/hashicorp/memberlist/memberlist.go:123 +0xa77

goroutine 54 [select]: github.com/hashicorp/memberlist.(_Memberlist).triggerFunc(0xc2080568c0, 0x12a05f200, 0xc2080ef6c0, 0xc208005980, 0xc2080e60d0) /opt/gopath/src/github.com/hashicorp/memberlist/state.go:98 +0x14a created by github.com/hashicorp/memberlist.(_Memberlist).schedule /opt/gopath/src/github.com/hashicorp/memberlist/state.go:70 +0x151

goroutine 55 [select]: github.com/hashicorp/memberlist.(_Memberlist).pushPullTrigger(0xc2080568c0, 0xc208005980) /opt/gopath/src/github.com/hashicorp/memberlist/state.go:122 +0x196 created by github.com/hashicorp/memberlist.(_Memberlist).schedule /opt/gopath/src/github.com/hashicorp/memberlist/state.go:76 +0x269

goroutine 56 [select]: github.com/hashicorp/memberlist.(_Memberlist).triggerFunc(0xc2080568c0, 0x1dcd6500, 0xc2080ef730, 0xc208005980, 0xc2080e60e0) /opt/gopath/src/github.com/hashicorp/memberlist/state.go:104 +0x109 created by github.com/hashicorp/memberlist.(_Memberlist).schedule /opt/gopath/src/github.com/hashicorp/memberlist/state.go:82 +0x31a

goroutine 57 [select]: github.com/hashicorp/serf/serf.(*Serf).handleReap(0xc20802eb60) /opt/gopath/src/github.com/hashicorp/serf/serf/serf.go:1333 +0x1b4 created by github.com/hashicorp/serf/serf.Create /opt/gopath/src/github.com/hashicorp/serf/serf/serf.go:364 +0x1289

goroutine 58 [select]: github.com/hashicorp/serf/serf.(*Serf).handleReconnect(0xc20802eb60) /opt/gopath/src/github.com/hashicorp/serf/serf/serf.go:1349 +0xc7 created by github.com/hashicorp/serf/serf.Create /opt/gopath/src/github.com/hashicorp/serf/serf/serf.go:365 +0x12a7

goroutine 59 [select]: github.com/hashicorp/serf/serf.(*Serf).checkQueueDepth(0xc20802eb60, 0xa43190, 0x6, 0xc2080f05d0) /opt/gopath/src/github.com/hashicorp/serf/serf/serf.go:1439 +0x3f1 created by github.com/hashicorp/serf/serf.Create /opt/gopath/src/github.com/hashicorp/serf/serf/serf.go:366 +0x12e5

goroutine 60 [select]: github.com/hashicorp/serf/serf.(*Serf).checkQueueDepth(0xc20802eb60, 0xa41870, 0x5, 0xc2080f0600) /opt/gopath/src/github.com/hashicorp/serf/serf/serf.go:1439 +0x3f1 created by github.com/hashicorp/serf/serf.Create /opt/gopath/src/github.com/hashicorp/serf/serf/serf.go:367 +0x1326

goroutine 61 [select]: github.com/hashicorp/serf/serf.(*Serf).checkQueueDepth(0xc20802eb60, 0xa469d0, 0x5, 0xc2080f0630) /opt/gopath/src/github.com/hashicorp/serf/serf/serf.go:1439 +0x3f1 created by github.com/hashicorp/serf/serf.Create /opt/gopath/src/github.com/hashicorp/serf/serf/serf.go:368 +0x1367

goroutine 71 [select]: github.com/hashicorp/raft.(_Raft).Barrier(0xc208010380, 0x0, 0x0, 0x0) /opt/gopath/src/github.com/hashicorp/raft/raft.go:305 +0x350 github.com/hashicorp/consul/consul.(_Server).leaderLoop(0xc20804e640, 0xc2081609c0) /opt/gopath/src/github.com/hashicorp/consul/consul/leader.go:68 +0x1f6 created by github.com/hashicorp/consul/consul.(*Server).monitorLeadership /opt/gopath/src/github.com/hashicorp/consul/consul/leader.go:35 +0xc7

goroutine 63 [IO wait]: net.runtime_pollWait(0x7fac1f6a9ed0, 0x72, 0x0) /opt/go/src/pkg/runtime/netpoll.goc:146 +0x66 net.(_pollDesc).Wait(0xc20802c6f0, 0x72, 0x0, 0x0) /opt/go/src/pkg/net/fd_poll_runtime.go:84 +0x46 net.(_pollDesc).WaitRead(0xc20802c6f0, 0x0, 0x0) /opt/go/src/pkg/net/fd_poll_runtime.go:89 +0x42 net.(_netFD).accept(0xc20802c690, 0xb87538, 0x0, 0x7fac1f6a8440, 0xb) /opt/go/src/pkg/net/fd_unix.go:409 +0x343 net.(_TCPListener).AcceptTCP(0xc208038120, 0x0, 0x0, 0x0) /opt/go/src/pkg/net/tcpsock_posix.go:234 +0x5d net.(_TCPListener).Accept(0xc208038120, 0x0, 0x0, 0x0, 0x0) /opt/go/src/pkg/net/tcpsock_posix.go:244 +0x4b github.com/hashicorp/consul/consul.(_Server).listen(0xc20804e640) /opt/gopath/src/github.com/hashicorp/consul/consul/rpc.go:47 +0x55 created by github.com/hashicorp/consul/consul.NewServer /opt/gopath/src/github.com/hashicorp/consul/consul/server.go:211 +0x9f3

goroutine 64 [IO wait]: net.runtime_pollWait(0x7fac1f6a9b60, 0x72, 0x0) /opt/go/src/pkg/runtime/netpoll.goc:146 +0x66 net.(_pollDesc).Wait(0xc2080ef870, 0x72, 0x0, 0x0) /opt/go/src/pkg/net/fd_poll_runtime.go:84 +0x46 net.(_pollDesc).WaitRead(0xc2080ef870, 0x0, 0x0) /opt/go/src/pkg/net/fd_poll_runtime.go:89 +0x42 net.(_netFD).accept(0xc2080ef810, 0xb87538, 0x0, 0x7fac1f6a8440, 0xb) /opt/go/src/pkg/net/fd_unix.go:409 +0x343 net.(_TCPListener).AcceptTCP(0xc208038248, 0x0, 0x0, 0x0) /opt/go/src/pkg/net/tcpsock_posix.go:234 +0x5d net.(_TCPListener).Accept(0xc208038248, 0x0, 0x0, 0x0, 0x0) /opt/go/src/pkg/net/tcpsock_posix.go:244 +0x4b github.com/hashicorp/consul/command/agent.(_AgentRPC).listen(0xc2080fa000) /opt/gopath/src/github.com/hashicorp/consul/command/agent/rpc.go:251 +0x58 created by github.com/hashicorp/consul/command/agent.NewAgentRPC /opt/gopath/src/github.com/hashicorp/consul/command/agent/rpc.go:219 +0x1c4

goroutine 65 [IO wait]: net.runtime_pollWait(0x7fac1f6a9ab0, 0x72, 0x0) /opt/go/src/pkg/runtime/netpoll.goc:146 +0x66 net.(_pollDesc).Wait(0xc2080ef8e0, 0x72, 0x0, 0x0) /opt/go/src/pkg/net/fd_poll_runtime.go:84 +0x46 net.(_pollDesc).WaitRead(0xc2080ef8e0, 0x0, 0x0) /opt/go/src/pkg/net/fd_poll_runtime.go:89 +0x42 net.(_netFD).accept(0xc2080ef880, 0xb87538, 0x0, 0x7fac1f6a8440, 0xb) /opt/go/src/pkg/net/fd_unix.go:409 +0x343 net.(_TCPListener).AcceptTCP(0xc208038250, 0xc208049828, 0x0, 0x0) /opt/go/src/pkg/net/tcpsock_posix.go:234 +0x5d net.(_TCPListener).Accept(0xc208038250, 0x0, 0x0, 0x0, 0x0) /opt/go/src/pkg/net/tcpsock_posix.go:244 +0x4b net/http.(_Server).Serve(0xc208005e60, 0x7fac1f6a9fb0, 0xc208038250, 0x0, 0x0) /opt/go/src/pkg/net/http/server.go:1698 +0x91 net/http.Serve(0x7fac1f6a9fb0, 0xc208038250, 0x7fac1f6aabd0, 0xc2080f0e40, 0x0, 0x0) /opt/go/src/pkg/net/http/server.go:1576 +0x7c created by github.com/hashicorp/consul/command/agent.NewHTTPServer /opt/gopath/src/github.com/hashicorp/consul/command/agent/http.go:50 +0x298

goroutine 66 [syscall]: syscall.Syscall(0x2f, 0x13, 0x7fac1cdbda68, 0x0, 0x1, 0x0, 0x42bc1b) /opt/go/src/pkg/syscall/asm_linux_amd64.s:21 +0x5 syscall.recvmsg(0x13, 0x7fac1cdbda68, 0x0, 0xc2080f0030, 0x0, 0x0) /opt/go/src/pkg/syscall/zsyscall_linux_amd64.go:1902 +0x56 syscall.Recvmsg(0x13, 0xc208162000, 0xffff, 0xffff, 0xc2080f0000, 0x28, 0x28, 0x0, 0x7fac1f698000, 0x0, ...) /opt/go/src/pkg/syscall/syscall_linux.go:517 +0x193 net.(_netFD).readMsg(0xc20802c850, 0xc208162000, 0xffff, 0xffff, 0xc2080f0000, 0x28, 0x28, 0x0, 0x0, 0x0, ...) /opt/go/src/pkg/net/fd_unix.go:282 +0x37e net.(_UDPConn).ReadMsgUDP(0xc2080384d0, 0xc208162000, 0xffff, 0xffff, 0xc2080f0000, 0x28, 0x28, 0x1, 0x436a2a, 0x88d780, ...) /opt/go/src/pkg/net/udpsock_posix.go:96 +0x18a github.com/miekg/dns.readFromSessionUDP(0xc2080384d0, 0xc208162000, 0xffff, 0xffff, 0xffff, 0xffff, 0x0, 0x0) /opt/gopath/src/github.com/miekg/dns/udp.go:48 +0xcb github.com/miekg/dns.(_Server).readUDP(0xc208005aa0, 0xc2080384d0, 0x77359400, 0x0, 0x0, 0x0, 0x1e, 0x0, 0x0) /opt/gopath/src/github.com/miekg/dns/server.go:395 +0xb9 github.com/miekg/dns.(_Server).serveUDP(0xc208005aa0, 0xc2080384d0, 0x0, 0x0) /opt/gopath/src/github.com/miekg/dns/server.go:294 +0x10c github.com/miekg/dns.(*Server).ListenAndServe(0xc208005aa0, 0x0, 0x0) /opt/gopath/src/github.com/miekg/dns/server.go:250 +0x4ec github.com/hashicorp/consul/command/agent.func·008() /opt/gopath/src/github.com/hashicorp/consul/command/agent/dns.go:85 +0x43 created by github.com/hashicorp/consul/command/agent.NewDNSServer /opt/gopath/src/github.com/hashicorp/consul/command/agent/dns.go:88 +0x6a7

goroutine 67 [IO wait]: net.runtime_pollWait(0x7fac1f6a9950, 0x72, 0x0) /opt/go/src/pkg/runtime/netpoll.goc:146 +0x66 net.(_pollDesc).Wait(0xc20802c920, 0x72, 0x0, 0x0) /opt/go/src/pkg/net/fd_poll_runtime.go:84 +0x46 net.(_pollDesc).WaitRead(0xc20802c920, 0x0, 0x0) /opt/go/src/pkg/net/fd_poll_runtime.go:89 +0x42 net.(_netFD).accept(0xc20802c8c0, 0xb87538, 0x0, 0x7fac1f6a8440, 0xb) /opt/go/src/pkg/net/fd_unix.go:409 +0x343 net.(_TCPListener).AcceptTCP(0xc2080384f8, 0x1, 0x0, 0x0) /opt/go/src/pkg/net/tcpsock_posix.go:234 +0x5d github.com/miekg/dns.(_Server).serveTCP(0xc208005b00, 0xc2080384f8, 0x0, 0x0) /opt/gopath/src/github.com/miekg/dns/server.go:268 +0xda github.com/miekg/dns.(_Server).ListenAndServe(0xc208005b00, 0x0, 0x0) /opt/gopath/src/github.com/miekg/dns/server.go:236 +0x233 github.com/hashicorp/consul/command/agent.func·009() /opt/gopath/src/github.com/hashicorp/consul/command/agent/dns.go:92 +0x43 created by github.com/hashicorp/consul/command/agent.NewDNSServer /opt/gopath/src/github.com/hashicorp/consul/command/agent/dns.go:95 +0x730

goroutine 70 [select]: github.com/hashicorp/consul/command/agent.(_localState).antiEntropy(0xc20802b210, 0xc2080045a0) /opt/gopath/src/github.com/hashicorp/consul/command/agent/local.go:250 +0x613 created by github.com/hashicorp/consul/command/agent.(_Agent).StartSync /opt/gopath/src/github.com/hashicorp/consul/command/agent/agent.go:361 +0x47

armon commented 10 years ago

So the critical lines are here:

2014/09/04 18:14:28 [ERR] snapshot: CRC checksum failed (stored: [67 214 101 86 72 179 89 11] computed: [0 0 0 0 0 0 0 0])
2014/09/04 18:14:28 [ERR] raft: Failed to open snapshot 37-857234-2014-09-04T15:01:37.085495752+04:00: CRC mismatch

Looks like it failed to load the latest snapshot, and then there were missing logs (e.g. the logs after this snapshot but before the next one). The big question is why the CRC failed.

Do you have your data dir by any chance?

vgv commented 10 years ago

Unfortunately not. I bootstrap consul server on other node, drop "raft" dir on this failed server, join this server to new leader and all are working great.

We consider to use consul in production, so I definitely will try to reproduce this problem. Thank you.

vgv commented 10 years ago

Just to be clear: If I will reproduce this error - do you need the whole consul ' data-dir' ?

armon commented 10 years ago

That would be useful if possible. I guess only the raft/ folder is of particular interest however.

armon commented 10 years ago

Closing this, since it looked like an issue with data corruption. Please re-open if it pops up!