Open fcddk opened 4 years ago
Hello fcddk, thank you for reporting this problem! From the stack trace it looks like this is a panic from boltdb
. I took a look around and found the following:
boltdb
.I was not able to find a clear solution to the problem, but I found some information that might help. First it seems the issue is related to the filesystem. One report suggested that the mount options that were being used for the filesystem may have been causing the problem. Another few reports suggested the problem was corrupt db files, and that removing the files fixed the problem.
If you run consul agent
with a different value for -data-dir
do you encounter the same problem? If the filesystem which contains data dir has any special mount options you may want to try on a different filesystem without those mount options.
I hope that helps. Please do let us know if it worked, and if you have any more questions!
@dnephin I run consul cluster on kubernetes with version 1.7.1 also meet this probelem like below:
from https://github.com/boltdb/bolt/releases also found that boltdb doesn't upgrade any more. So if the problem is born of boltdb, who can solve the storage probelm of boltdb
I'm having a similar issue on my home kubectl cluster:
2020-07-03T21:41:41.722824738Z bootstrap_expect > 0: expecting 3 servers
2020-07-03T21:41:41.722871382Z ==> Starting Consul agent...
2020-07-03T21:41:41.722969222Z Version: 'v1.8.0'
2020-07-03T21:41:41.722976474Z Node ID: '8fa6500c-0eac-9d7e-2540-4986953195fd'
2020-07-03T21:41:41.722980032Z Node name: 'consul-consul-server-0'
2020-07-03T21:41:41.722982969Z Datacenter: 'home1' (Segment: '<all>')
2020-07-03T21:41:41.722986267Z Server: true (Bootstrap: false)
2020-07-03T21:41:41.722989297Z Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)
2020-07-03T21:41:41.722993135Z Cluster Addr: 10.38.0.9 (LAN: 8301, WAN: 8302)
2020-07-03T21:41:41.722996225Z Encrypt: Gossip: true, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false
2020-07-03T21:41:41.722999253Z
2020-07-03T21:41:41.723002035Z ==> Log data will now stream in as it occurs:
2020-07-03T21:41:41.723008404Z
2020-07-03T21:41:41.725907687Z panic: page 3949 already freed
2020-07-03T21:41:41.725917433Z
2020-07-03T21:41:41.725920835Z goroutine 1 [running]:
2020-07-03T21:41:41.725923425Z github.com/boltdb/bolt.(*freelist).free(0xc000559560, 0x10d66a, 0x7f7975d3c000)
2020-07-03T21:41:41.725925512Z /go/pkg/mod/github.com/boltdb/bolt@v1.3.1/freelist.go:121 +0x2a0
2020-07-03T21:41:41.725927679Z github.com/boltdb/bolt.(*Tx).Commit(0xc0003fec40, 0x51e0150, 0x4)
2020-07-03T21:41:41.725929682Z /go/pkg/mod/github.com/boltdb/bolt@v1.3.1/tx.go:176 +0x1b5
2020-07-03T21:41:41.725931660Z github.com/hashicorp/raft-boltdb.(*BoltStore).initialize(0xc000553fa0, 0x0, 0x0)
2020-07-03T21:41:41.725934572Z /go/pkg/mod/github.com/hashicorp/raft-boltdb@v0.0.0-20171010151810-6e5ba93211ea/bolt_store.go:105 +0x143
2020-07-03T21:41:41.725936675Z github.com/hashicorp/raft-boltdb.New(0xc000054f80, 0x19, 0x0, 0xc000054f00, 0x19, 0x0, 0xc0002adee8)
2020-07-03T21:41:41.725944660Z /go/pkg/mod/github.com/hashicorp/raft-boltdb@v0.0.0-20171010151810-6e5ba93211ea/bolt_store.go:81 +0xf7
2020-07-03T21:41:41.725948074Z github.com/hashicorp/raft-boltdb.NewBoltStore(...)
2020-07-03T21:41:41.725950107Z /go/pkg/mod/github.com/hashicorp/raft-boltdb@v0.0.0-20171010151810-6e5ba93211ea/bolt_store.go:60
2020-07-03T21:41:41.725952708Z github.com/hashicorp/consul/agent/consul.(*Server).setupRaft(0xc0003b8300, 0x0, 0x0)
2020-07-03T21:41:41.725954726Z /home/circleci/project/consul/agent/consul/server.go:702 +0xa4a
2020-07-03T21:41:41.725956784Z github.com/hashicorp/consul/agent/consul.NewServerLogger(0xc0000e0e00, 0x38ada20, 0xc00083d590, 0xc000582000, 0xc000499a40, 0x0, 0x0, 0x0)
2020-07-03T21:41:41.725959460Z /home/circleci/project/consul/agent/consul/server.go:499 +0x10a9
2020-07-03T21:41:41.725974130Z github.com/hashicorp/consul/agent.(*Agent).Start(0xc00039a000, 0x0, 0x0)
2020-07-03T21:41:41.725978917Z /home/circleci/project/consul/agent/agent.go:449 +0x7d7
2020-07-03T21:41:41.725996571Z github.com/hashicorp/consul/command/agent.(*cmd).run(0xc00029c000, 0xc00004c140, 0xf, 0x10, 0x0)
2020-07-03T21:41:41.726000697Z /home/circleci/project/consul/command/agent/agent.go:287 +0xf08
2020-07-03T21:41:41.726003909Z github.com/hashicorp/consul/command/agent.(*cmd).Run(0xc00029c000, 0xc00004c140, 0xf, 0x10, 0xc000205d80)
2020-07-03T21:41:41.726007952Z /home/circleci/project/consul/command/agent/agent.go:76 +0x4d
2020-07-03T21:41:41.726011529Z github.com/mitchellh/cli.(*CLI).Run(0xc0002183c0, 0xc000218300, 0x80, 0xc0005a86a0)
2020-07-03T21:41:41.726014926Z /go/pkg/mod/github.com/mitchellh/cli@v1.1.0/cli.go:260 +0x1da
2020-07-03T21:41:41.726018112Z main.realMain(0xc0000a0058)
2020-07-03T21:41:41.726021454Z /home/circleci/project/consul/main.go:50 +0x397
2020-07-03T21:41:41.726024683Z main.main()
2020-07-03T21:41:41.726028355Z /home/circleci/project/consul/main.go:22 +0x22
I think this happened after a power outage (at least that's when I noticed it, but I haven't looked into my Consul setup in a little while, so it may have happened before then), I wonder if the bolt database got corrupted somehow?
@sarahhodne yes, you have my doubts
@sarahhodne yes, you have my doubts
bootstrap_expect > 0: expecting 3 servers