lni / dragonboat

A feature complete and high performance multi-group Raft library in Go.
Apache License 2.0
5.06k stars 541 forks source link

Randomly panic #259

Closed hulucc closed 1 year ago

hulucc commented 1 year ago

I believe this is my misusage, but I'm not sure how to solve it. I have three nodehost and one shard. After reboot, program will call StartCluster to recovery the shard, then panic. Any idea will be pleasure.

shard config

    return config.Config{
        NodeID:              nodeID,
        ClusterID:           shardID,
        CheckQuorum:         true,
        ElectionRTT:         10,
        HeartbeatRTT:        2,
        SnapshotEntries:     100,
        CompactionOverhead:  0,
        OrderedConfigChange: false,
    }

Dragonboat version

v3.3.5

Expected behavior

no panics

Actual behavior

randomly panic after restart

[INFO] DataVersion: 60abf29917f0816141585d494a856b8aae121c1cd26aa1c81b5ec55147c85f0e
[INFO] CodeVersion: 60abf29917f0816141585d494a856b8aae121c1cd26aa1c81b5ec55147c85f0e
2022-11-28 06:29:13.197228 I | dragonboat: go version: go1.19, linux/amd64
2022-11-28 06:29:13.197348 I | dragonboat: dragonboat version: 3.3.5 (Rel)
2022-11-28 06:29:13.197400 I | config: using default EngineConfig
2022-11-28 06:29:13.197571 I | config: using default LogDBConfig
2022-11-28 06:29:13.197770 I | dragonboat: DeploymentID set to 1
2022-11-28 06:29:13.207918 I | dragonboat: LogDB info received, shard 0, busy false
2022-11-28 06:29:13.218901 I | dragonboat: LogDB info received, shard 1, busy false
2022-11-28 06:29:13.228163 I | dragonboat: LogDB info received, shard 2, busy false
2022-11-28 06:29:13.237630 I | dragonboat: LogDB info received, shard 3, busy false
2022-11-28 06:29:13.246292 I | dragonboat: LogDB info received, shard 4, busy false
2022-11-28 06:29:13.254084 I | dragonboat: LogDB info received, shard 5, busy false
2022-11-28 06:29:13.268796 I | dragonboat: LogDB info received, shard 6, busy false
2022-11-28 06:29:13.279942 I | dragonboat: LogDB info received, shard 7, busy false
2022-11-28 06:29:13.288599 I | dragonboat: LogDB info received, shard 8, busy false
2022-11-28 06:29:13.301422 I | dragonboat: LogDB info received, shard 9, busy false
2022-11-28 06:29:13.311992 I | dragonboat: LogDB info received, shard 10, busy false
2022-11-28 06:29:13.326616 I | dragonboat: LogDB info received, shard 11, busy false
2022-11-28 06:29:13.344879 I | dragonboat: LogDB info received, shard 12, busy false
2022-11-28 06:29:13.363999 I | dragonboat: LogDB info received, shard 13, busy false
2022-11-28 06:29:13.374470 I | dragonboat: LogDB info received, shard 14, busy false
2022-11-28 06:29:13.385537 I | logdb: using plain logdb
2022-11-28 06:29:13.385605 I | dragonboat: LogDB info received, shard 15, busy false
2022-11-28 06:29:13.386022 I | dragonboat: logdb memory limit: 8192 MBytes
2022-11-28 06:29:13.386047 I | dragonboat: NodeHost ID: nhid-14008783967022962185
2022-11-28 06:29:13.386051 I | dragonboat: using regular node registry
2022-11-28 06:29:13.386058 I | dragonboat: filesystem error injection mode enabled: false
2022-11-28 06:29:13.386509 I | transport: transport type: go-tcp-transport
2022-11-28 06:29:13.388798 I | dragonboat: transport type: go-tcp-transport
2022-11-28 06:29:13.388811 I | dragonboat: logdb type: sharded-pebble
2022-11-28 06:29:13.388816 I | dragonboat: nodehost address: moxa-2.moxa-headless.temp.svc.cluster.local:63000
[INFO] CodeVersion match with DataVersion, skip migration
[INFO] shardmanager: decided to recovery shard 0
2022-11-28 06:29:13.394448 I | dragonboat: [00000:62185] replaying raft logs
2022-11-28 06:29:13.395031 I | dragonboat: [00000:62185] has logdb entries size 0 commit 27 term 75
2022-11-28 06:29:13.395161 I | raft: [00000:62185] created, initial: false, new: false
2022-11-28 06:29:13.395247 W | config: ElectionRTT is not a magnitude larger than HeartbeatRTT
2022-11-28 06:29:13.395346 I | raft: [00000:62185] raft log rate limit enabled: false, 0
2022-11-28 06:29:13.395445 I | raft: [f:28,l:27,t:74,c:27,a:27] [00000:62185] t75 became follower
2022-11-28 06:29:13.402036 I | dragonboat: [00000:62185] recovered from <00000:62185:27>
2022-11-28 06:29:13.402199 I | dragonboat: [00000:62185] initialized using <00000:62185:27>
2022-11-28 06:29:13.402214 I | dragonboat: [00000:62185] initial index set to 27
2022-11-28 06:29:14.397591 W | dragonboat: StaleRead called, linearizability not guaranteed for stale read
[INFO] OnSubShardsUpdating shards version 0 -> 2
[INFO] ShardSpecChangingWorker 0 shard changes detected
2022-11-28 06:29:14.511861 W | raft: [f:28,l:27,t:74,c:27,a:27] [00000:62185] t75 received Heartbeat with higher term (76) from n97714
2022-11-28 06:29:14.511985 W | raft: [f:28,l:27,t:74,c:27,a:27] [00000:62185] t75 become follower after receiving higher term from n97714
2022-11-28 06:29:14.512013 I | raft: [f:28,l:27,t:74,c:27,a:27] [00000:62185] t76 became follower
2022-11-28 06:29:14.512020 C | raft: invalid commitTo index 28, lastIndex() 27
panic: invalid commitTo index 28, lastIndex() 27

goroutine 730 [running]:
github.com/lni/goutils/logutil/capnslog.(*PackageLogger).Panicf(0xc000204000?, {0xe1edcf?, 0xc00019e880?}, {0xc0010f6ba0?, 0xc00021a8d0?, 0xc000731e30?})
        github.com/lni/goutils@v1.3.0/logutil/capnslog/pkg_logger.go:88 +0xbb
github.com/lni/dragonboat/v3/logger.(*capnsLog).Panicf(0xc00021a8d0?, {0xe1edcf?, 0x40d947?}, {0xc0010f6ba0?, 0xca5220?, 0x1?})
        github.com/lni/dragonboat/v3@v3.3.5/logger/capnslogger.go:74 +0x26
github.com/lni/dragonboat/v3/logger.(*dragonboatLogger).Panicf(0xc000192410?, {0xe1edcf, 0x29}, {0xc0010f6ba0, 0x2, 0x2})
        github.com/lni/dragonboat/v3@v3.3.5/logger/logger.go:132 +0x57
github.com/lni/dragonboat/v3/internal/raft.(*entryLog).commitTo(0xc00036d2d0, 0x1c)
        github.com/lni/dragonboat/v3@v3.3.5/internal/raft/logentry.go:328 +0x102
github.com/lni/dragonboat/v3/internal/raft.(*raft).handleHeartbeatMessage(_, {0x11, 0xc26932cbd9975609, 0xfe93de3932274132, 0x0, 0x4c, 0x0, 0x0, 0x1c, 0x0, ...})
        github.com/lni/dragonboat/v3@v3.3.5/internal/raft/raft.go:1317 +0x45
github.com/lni/dragonboat/v3/internal/raft.(*raft).handleFollowerHeartbeat(_, {0x11, 0xc26932cbd9975609, 0xfe93de3932274132, 0x0, 0x4c, 0x0, 0x0, 0x1c, 0x0, ...})
        github.com/lni/dragonboat/v3@v3.3.5/internal/raft/raft.go:1933 +0x85
github.com/lni/dragonboat/v3/internal/raft.defaultHandle(_, {0x11, 0xc26932cbd9975609, 0xfe93de3932274132, 0x0, 0x4c, 0x0, 0x0, 0x1c, 0x0, ...})
        github.com/lni/dragonboat/v3@v3.3.5/internal/raft/raft.go:2098 +0x95
github.com/lni/dragonboat/v3/internal/raft.(*raft).Handle(_, {0x11, 0xc26932cbd9975609, 0xfe93de3932274132, 0x0, 0x4c, 0x0, 0x0, 0x1c, 0x0, ...})
        github.com/lni/dragonboat/v3@v3.3.5/internal/raft/raft.go:1483 +0x27f
github.com/lni/dragonboat/v3/internal/raft.(*Peer).Handle(_, {0x11, 0xc26932cbd9975609, 0xfe93de3932274132, 0x0, 0x4c, 0x0, 0x0, 0x1c, 0x0, ...})
        github.com/lni/dragonboat/v3@v3.3.5/internal/raft/peer.go:195 +0x185
github.com/lni/dragonboat/v3.(*node).handleReceivedMessages(0xc000310200)
        github.com/lni/dragonboat/v3@v3.3.5/node.go:1275 +0x358
github.com/lni/dragonboat/v3.(*node).handleEvents(0xc000310200)
        github.com/lni/dragonboat/v3@v3.3.5/node.go:1133 +0x73
github.com/lni/dragonboat/v3.(*node).stepNode(_)
        github.com/lni/dragonboat/v3@v3.3.5/node.go:1111 +0x150
github.com/lni/dragonboat/v3.(*engine).processSteps(0xc000329360, 0xc000733da8?, 0xc000733e38?, 0xc0012c5920, {0x1533328, 0x1?, 0x0}, 0xc0000d9bc0?)
        github.com/lni/dragonboat/v3@v3.3.5/engine.go:1279 +0x265
github.com/lni/dragonboat/v3.(*engine).stepWorkerMain(0xc000329360, 0x1)
        github.com/lni/dragonboat/v3@v3.3.5/engine.go:1215 +0x2be
github.com/lni/dragonboat/v3.newExecEngine.func1()
        github.com/lni/dragonboat/v3@v3.3.5/engine.go:1017 +0x68
github.com/lni/goutils/syncutil.(*Stopper).runWorker.func1()
        github.com/lni/goutils@v1.3.0/syncutil/stopper.go:80 +0xc5
created by github.com/lni/goutils/syncutil.(*Stopper).runWorker
        github.com/lni/goutils@v1.3.0/syncutil/stopper.go:75 +0xea

Steps to reproduce the behavior

restart over and over

hulucc commented 1 year ago

Wrong use of tools.ImportSnapshot, never mind.