lni / dragonboat

A feature complete and high performance multi-group Raft library in Go.
Apache License 2.0
4.99k stars 534 forks source link

RSM close called twice #229

Open uber42 opened 2 years ago

uber42 commented 2 years ago

The flag was set in the rsm close handler, the close sometimes happens twice.
At the second closing, we throw out the panic:

github.com/lni/dragonboat/v3/internal/rsm.(*OnDiskStateMachine).Close(0xc00032c000)
    /go/pkg/mod/github.com/lni/dragonboat/v3@v3.3.1/internal/rsm/adapter.go:338 +0x43
github.com/lni/dragonboat/v3/internal/rsm.(*NativeSM).Close(0xc0004260d0)
    /go/pkg/mod/github.com/lni/dragonboat/v3@v3.3.1/internal/rsm/managed.go:150 +0x4a
github.com/lni/dragonboat/v3/internal/rsm.(*StateMachine).Close(...)
    /go/pkg/mod/github.com/lni/dragonboat/v3@v3.3.1/internal/rsm/statemachine.go:233
github.com/lni/dragonboat/v3.(*node).destroy(...)
    /go/pkg/mod/github.com/lni/dragonboat/v3@v3.3.1/node.go:520
github.com/lni/dragonboat/v3.(*closeWorker).handle(...)
    /go/pkg/mod/github.com/lni/dragonboat/v3@v3.3.1/engine.go:777
github.com/lni/dragonboat/v3.(*closeWorker).workerMain(0xc000139520)
    /go/pkg/mod/github.com/lni/dragonboat/v3@v3.3.1/engine.go:766 +0x86
github.com/lni/dragonboat/v3.newCloseWorker.func1()
    /go/pkg/mod/github.com/lni/dragonboat/v3@v3.3.1/engine.go:755 +0x31
github.com/lni/goutils/syncutil.(*Stopper).runWorker.func1()
    /go/pkg/mod/github.com/lni/goutils@v1.3.0/syncutil/stopper.go:79 +0x12f
created by github.com/lni/goutils/syncutil.(*Stopper).runWorker
    /go/pkg/mod/github.com/lni/goutils@v1.3.0/syncutil/stopper.go:74 +0x19

Dragonboat version

v3.3.1

Steps to reproduce the behavior

Sometimes when closing RSM

lni commented 2 years ago

this is pretty strange as it never happened on my setup.

any extra info on this?

uber42 commented 2 years ago

Unfortunately there is nothing more

lni commented 2 years ago

Any chance you can provide the full log? that will help to identify the problem. thanks.

lni commented 2 years ago

I think there is a bug introduced when closeWorkerPool was added. Will address that in the next few days.

uber42 commented 2 years ago

I've had new problem, maybe it's something to do with it?

panic: close of closed channel

goroutine 70 [running]:
github.com/lni/dragonboat/v3/internal/rsm.(*OffloadedStatus).SetDestroyed(0xc00010e3e0)
        /home/user/go/pkg/mod/github.com/lni/dragonboat/v3@v3.3.1/internal/rsm/offload.go:45 +0x34
github.com/lni/dragonboat/v3/internal/rsm.(*NativeSM).Close(0xc00010e340)
        /home/user/go/pkg/mod/github.com/lni/dragonboat/v3@v3.3.1/internal/rsm/managed.go:153 +0x5a
github.com/lni/dragonboat/v3/internal/rsm.(*StateMachine).Close(0xc00061a180)
        /home/user/go/pkg/mod/github.com/lni/dragonboat/v3@v3.3.1/internal/rsm/statemachine.go:233 +0x2b
github.com/lni/dragonboat/v3.(*node).destroy(0xc00048ec00)
        /home/user/go/pkg/mod/github.com/lni/dragonboat/v3@v3.3.1/node.go:520 +0x2c
github.com/lni/dragonboat/v3.(*closeWorker).handle(0xc000118700, {0xc00048ec00})
        /home/user/go/pkg/mod/github.com/lni/dragonboat/v3@v3.3.1/engine.go:777 +0x26
github.com/lni/dragonboat/v3.(*closeWorker).workerMain(0xc000118700)
        /home/user/go/pkg/mod/github.com/lni/dragonboat/v3@v3.3.1/engine.go:766 +0x109
github.com/lni/dragonboat/v3.newCloseWorker.func1()
        /home/user/go/pkg/mod/github.com/lni/dragonboat/v3@v3.3.1/engine.go:755 +0x25
github.com/lni/goutils/syncutil.(*Stopper).runWorker.func1()
        /home/user/go/pkg/mod/github.com/lni/goutils@v1.3.0/syncutil/stopper.go:79 +0x173
created by github.com/lni/goutils/syncutil.(*Stopper).runWorker
        /home/user/go/pkg/mod/github.com/lni/goutils@v1.3.0/syncutil/stopper.go:74 +0x133
lni commented 2 years ago

Both above reported crashes were caused by the same issue.

I think it is now fixed in the master branch. Will test it a little bit more before back porting it to v3.3. Thanks for reporting the issue, @uber42, will keep you updated.