hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.31k stars 4.42k forks source link

panic: invalid page type #7471

Open fcddk opened 4 years ago

fcddk commented 4 years ago

bootstrap_expect > 0: expecting 3 servers

==> Starting Consul agent...
           Version: 'v1.6.0-rc1 (71f98661d)'
           Node ID: 'cbdab978-4df9-b604-0553-5cd6a00cf812'
         Node name: 'consul-0'
        Datacenter: 'dc1' (Segment: '<all>')
            Server: true (Bootstrap: false)
       Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)
      Cluster Addr: 100.101.219.34 (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false

==> Log data will now stream in as it occurs:

panic: invalid page type: 3404: 10

goroutine 1 [running]:
github.com/boltdb/bolt.(*Cursor).search(0xc0007d0108, 0x4f860f0, 0x4, 0x4, 0xd4c)
    /go/pkg/mod/github.com/boltdb/bolt@v1.3.1/cursor.go:256 +0x354
github.com/boltdb/bolt.(*Cursor).seek(0xc0007d0108, 0x4f860f0, 0x4, 0x4, 0x0, 0x0, 0x0, 0x0, 0xc0007d01b8, 0x146aeda, ...)
    /go/pkg/mod/github.com/boltdb/bolt@v1.3.1/cursor.go:159 +0x7e
github.com/boltdb/bolt.(*Bucket).CreateBucket(0xc00004c1d8, 0x4f860f0, 0x4, 0x4, 0xc0007d0228, 0x42c48f, 0xc0000fc300)
    /go/pkg/mod/github.com/boltdb/bolt@v1.3.1/bucket.go:172 +0xf0
github.com/boltdb/bolt.(*Bucket).CreateBucketIfNotExists(0xc00004c1d8, 0x4f860f0, 0x4, 0x4, 0x0, 0x20, 0x29bb4c0)
    /go/pkg/mod/github.com/boltdb/bolt@v1.3.1/bucket.go:206 +0x4d
github.com/boltdb/bolt.(*Tx).CreateBucketIfNotExists(...)
    /go/pkg/mod/github.com/boltdb/bolt@v1.3.1/tx.go:115
github.com/hashicorp/raft-boltdb.(*BoltStore).initialize(0xc000105ec0, 0x0, 0x0)
    /go/pkg/mod/github.com/hashicorp/raft-boltdb@v0.0.0-20171010151810-6e5ba93211ea/bolt_store.go:98 +0xba
github.com/hashicorp/raft-boltdb.New(0xc000376780, 0x19, 0x0, 0xc000376700, 0x19, 0x0, 0x0)
    /go/pkg/mod/github.com/hashicorp/raft-boltdb@v0.0.0-20171010151810-6e5ba93211ea/bolt_store.go:81 +0xfe
github.com/hashicorp/raft-boltdb.NewBoltStore(...)
    /go/pkg/mod/github.com/hashicorp/raft-boltdb@v0.0.0-20171010151810-6e5ba93211ea/bolt_store.go:60
github.com/hashicorp/consul/agent/consul.(*Server).setupRaft(0xc00054e380, 0x0, 0x0)
    /home/circleci/project/consul/agent/consul/server.go:630 +0xa36
github.com/hashicorp/consul/agent/consul.NewServerLogger(0xc0001ae000, 0xc0002bf4a0, 0xc00045c800, 0xc000452050, 0x0, 0x0, 0x0)
    /home/circleci/project/consul/agent/consul/server.go:431 +0xafe
github.com/hashicorp/consul/agent.(*Agent).Start(0xc0004146c0, 0x0, 0x0)
    /home/circleci/project/consul/agent/agent.go:380 +0x56e
github.com/hashicorp/consul/command/agent.(*cmd).run(0xc000125200, 0xc0000f8020, 0xc, 0xc, 0x0)
    /home/circleci/project/consul/command/agent/agent.go:280 +0xf5b
github.com/hashicorp/consul/command/agent.(*cmd).Run(0xc000125200, 0xc0000f8020, 0xc, 0xc, 0xc000105b00)
    /home/circleci/project/consul/command/agent/agent.go:75 +0x4d
github.com/mitchellh/cli.(*CLI).Run(0xc00014af00, 0xc00014af00, 0x80, 0xc000105b80)
    /go/pkg/mod/github.com/mitchellh/cli@v1.0.0/cli.go:255 +0x1f1
main.realMain(0xc0000ce058)
    /home/circleci/project/consul/main.go:53 +0x393
main.main()
    /home/circleci/project/consul/main.go:20 +0x22
dnephin commented 4 years ago

Hello fcddk, thank you for reporting this problem! From the stack trace it looks like this is a panic from boltdb. I took a look around and found the following:

I was not able to find a clear solution to the problem, but I found some information that might help. First it seems the issue is related to the filesystem. One report suggested that the mount options that were being used for the filesystem may have been causing the problem. Another few reports suggested the problem was corrupt db files, and that removing the files fixed the problem.

If you run consul agent with a different value for -data-dir do you encounter the same problem? If the filesystem which contains data dir has any special mount options you may want to try on a different filesystem without those mount options.

I hope that helps. Please do let us know if it worked, and if you have any more questions!

like-inspur commented 4 years ago

@dnephin I run consul cluster on kubernetes with version 1.7.1 also meet this probelem like below: {@8E2X6% ~PCLZG$NO 5RSD

like-inspur commented 4 years ago

from https://github.com/boltdb/bolt/releases also found that boltdb doesn't upgrade any more. So if the problem is born of boltdb, who can solve the storage probelm of boltdb

sarahhodne commented 4 years ago

I'm having a similar issue on my home kubectl cluster:

2020-07-03T21:41:41.722824738Z bootstrap_expect > 0: expecting 3 servers
2020-07-03T21:41:41.722871382Z ==> Starting Consul agent...
2020-07-03T21:41:41.722969222Z            Version: 'v1.8.0'
2020-07-03T21:41:41.722976474Z            Node ID: '8fa6500c-0eac-9d7e-2540-4986953195fd'
2020-07-03T21:41:41.722980032Z          Node name: 'consul-consul-server-0'
2020-07-03T21:41:41.722982969Z         Datacenter: 'home1' (Segment: '<all>')
2020-07-03T21:41:41.722986267Z             Server: true (Bootstrap: false)
2020-07-03T21:41:41.722989297Z        Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)
2020-07-03T21:41:41.722993135Z       Cluster Addr: 10.38.0.9 (LAN: 8301, WAN: 8302)
2020-07-03T21:41:41.722996225Z            Encrypt: Gossip: true, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false
2020-07-03T21:41:41.722999253Z 
2020-07-03T21:41:41.723002035Z ==> Log data will now stream in as it occurs:
2020-07-03T21:41:41.723008404Z 
2020-07-03T21:41:41.725907687Z panic: page 3949 already freed
2020-07-03T21:41:41.725917433Z 
2020-07-03T21:41:41.725920835Z goroutine 1 [running]:
2020-07-03T21:41:41.725923425Z github.com/boltdb/bolt.(*freelist).free(0xc000559560, 0x10d66a, 0x7f7975d3c000)
2020-07-03T21:41:41.725925512Z  /go/pkg/mod/github.com/boltdb/bolt@v1.3.1/freelist.go:121 +0x2a0
2020-07-03T21:41:41.725927679Z github.com/boltdb/bolt.(*Tx).Commit(0xc0003fec40, 0x51e0150, 0x4)
2020-07-03T21:41:41.725929682Z  /go/pkg/mod/github.com/boltdb/bolt@v1.3.1/tx.go:176 +0x1b5
2020-07-03T21:41:41.725931660Z github.com/hashicorp/raft-boltdb.(*BoltStore).initialize(0xc000553fa0, 0x0, 0x0)
2020-07-03T21:41:41.725934572Z  /go/pkg/mod/github.com/hashicorp/raft-boltdb@v0.0.0-20171010151810-6e5ba93211ea/bolt_store.go:105 +0x143
2020-07-03T21:41:41.725936675Z github.com/hashicorp/raft-boltdb.New(0xc000054f80, 0x19, 0x0, 0xc000054f00, 0x19, 0x0, 0xc0002adee8)
2020-07-03T21:41:41.725944660Z  /go/pkg/mod/github.com/hashicorp/raft-boltdb@v0.0.0-20171010151810-6e5ba93211ea/bolt_store.go:81 +0xf7
2020-07-03T21:41:41.725948074Z github.com/hashicorp/raft-boltdb.NewBoltStore(...)
2020-07-03T21:41:41.725950107Z  /go/pkg/mod/github.com/hashicorp/raft-boltdb@v0.0.0-20171010151810-6e5ba93211ea/bolt_store.go:60
2020-07-03T21:41:41.725952708Z github.com/hashicorp/consul/agent/consul.(*Server).setupRaft(0xc0003b8300, 0x0, 0x0)
2020-07-03T21:41:41.725954726Z  /home/circleci/project/consul/agent/consul/server.go:702 +0xa4a
2020-07-03T21:41:41.725956784Z github.com/hashicorp/consul/agent/consul.NewServerLogger(0xc0000e0e00, 0x38ada20, 0xc00083d590, 0xc000582000, 0xc000499a40, 0x0, 0x0, 0x0)
2020-07-03T21:41:41.725959460Z  /home/circleci/project/consul/agent/consul/server.go:499 +0x10a9
2020-07-03T21:41:41.725974130Z github.com/hashicorp/consul/agent.(*Agent).Start(0xc00039a000, 0x0, 0x0)
2020-07-03T21:41:41.725978917Z  /home/circleci/project/consul/agent/agent.go:449 +0x7d7
2020-07-03T21:41:41.725996571Z github.com/hashicorp/consul/command/agent.(*cmd).run(0xc00029c000, 0xc00004c140, 0xf, 0x10, 0x0)
2020-07-03T21:41:41.726000697Z  /home/circleci/project/consul/command/agent/agent.go:287 +0xf08
2020-07-03T21:41:41.726003909Z github.com/hashicorp/consul/command/agent.(*cmd).Run(0xc00029c000, 0xc00004c140, 0xf, 0x10, 0xc000205d80)
2020-07-03T21:41:41.726007952Z  /home/circleci/project/consul/command/agent/agent.go:76 +0x4d
2020-07-03T21:41:41.726011529Z github.com/mitchellh/cli.(*CLI).Run(0xc0002183c0, 0xc000218300, 0x80, 0xc0005a86a0)
2020-07-03T21:41:41.726014926Z  /go/pkg/mod/github.com/mitchellh/cli@v1.1.0/cli.go:260 +0x1da
2020-07-03T21:41:41.726018112Z main.realMain(0xc0000a0058)
2020-07-03T21:41:41.726021454Z  /home/circleci/project/consul/main.go:50 +0x397
2020-07-03T21:41:41.726024683Z main.main()
2020-07-03T21:41:41.726028355Z  /home/circleci/project/consul/main.go:22 +0x22

I think this happened after a power outage (at least that's when I noticed it, but I haven't looked into my Consul setup in a little while, so it may have happened before then), I wonder if the bolt database got corrupted somehow?

like-inspur commented 4 years ago

@sarahhodne yes, you have my doubts

like-inspur commented 4 years ago

@sarahhodne yes, you have my doubts

jsosulska commented 4 years ago

Hello all!

To update this thread - I have created a top level issue to track upgrading BoltDB to bbolt here. Please follow that work as a precursor to the issues mentioned here.

Thank you all for your patience!