Closed rickardgranberg closed 4 years ago
We're seeing an intermittent problem with a panic when a trigger is fired from a callback (in a separate goroutine) The call stack is as follows:
panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x6c78a1] goroutine 711 [running]: container/list.(*List).insert(...) /usr/local/go/src/container/list/list.go:96 container/list.(*List).insertValue(...) /usr/local/go/src/container/list/list.go:104 container/list.(*List).PushBack(...) /usr/local/go/src/container/list/list.go:155 github.com/qmuntal/stateless.(*StateMachine).internalFireQueued(0xc00004c4e0, 0xebfb40, 0xc00019e010, 0xc589e0, 0xe9cd30, 0x0, 0x0, 0x0, 0x0, 0x0) /go/pkg/mod/github.com/qmuntal/stateless@v1.1.3/statemachine.go:296 +0x4c1 github.com/qmuntal/stateless.(*StateMachine).internalFire(0xc00004c4e0, 0xebfb40, 0xc00019e010, 0xc589e0, 0xe9cd30, 0x0, 0x0, 0x0, 0x0, 0x0) /go/pkg/mod/github.com/qmuntal/stateless@v1.1.3/statemachine.go:289 +0x89 github.com/qmuntal/stateless.(*StateMachine).FireCtx(...) /go/pkg/mod/github.com/qmuntal/stateless@v1.1.3/statemachine.go:227 github.com/qmuntal/stateless.(*StateMachine).Fire(...) /go/pkg/mod/github.com/qmuntal/stateless@v1.1.3/statemachine.go:220 ... our code omitted...
We've been seeing this on and off when we run our unit tests so I decided to investigate. I believe theres a concurrency issue in the internalFireQueued function.
internalFireQueued
Here's my analysis: Our code is running some code in the OnEntry func as it has just transitioned to a new state. The callback mentioned above fires a new trigger, which is adding the next trigger to the sm.eventQueue in https://github.com/qmuntal/stateless/blob/3ead642f6c11905eb551a5076e4b40169fba3626/statemachine.go#L296
OnEntry
sm.eventQueue
At the same time as the new trigger is being added, the OnEntry func returns, and the internalFireQueued resumes execution on https://github.com/qmuntal/stateless/blob/3ead642f6c11905eb551a5076e4b40169fba3626/statemachine.go#L308
The problematic lines of code at this point are:
https://github.com/qmuntal/stateless/blob/3ead642f6c11905eb551a5076e4b40169fba3626/statemachine.go#L311 https://github.com/qmuntal/stateless/blob/3ead642f6c11905eb551a5076e4b40169fba3626/statemachine.go#L312 https://github.com/qmuntal/stateless/blob/3ead642f6c11905eb551a5076e4b40169fba3626/statemachine.go#L318
All of these is accessing the sm.eventQueue which is being mutated by the callback on line https://github.com/qmuntal/stateless/blob/3ead642f6c11905eb551a5076e4b40169fba3626/statemachine.go#L296
Any access to the sm.eventQueue must be done within the scope of sm.firingMutex to ensure "thread" safety.
sm.firingMutex
With the changes in PR #13, we're not seeing the issue any more
Fixed in v1.1.4
We're seeing an intermittent problem with a panic when a trigger is fired from a callback (in a separate goroutine) The call stack is as follows:
We've been seeing this on and off when we run our unit tests so I decided to investigate. I believe theres a concurrency issue in the
internalFireQueued
function.Here's my analysis: Our code is running some code in the
OnEntry
func as it has just transitioned to a new state. The callback mentioned above fires a new trigger, which is adding the next trigger to thesm.eventQueue
in https://github.com/qmuntal/stateless/blob/3ead642f6c11905eb551a5076e4b40169fba3626/statemachine.go#L296At the same time as the new trigger is being added, the
OnEntry
func returns, and theinternalFireQueued
resumes execution on https://github.com/qmuntal/stateless/blob/3ead642f6c11905eb551a5076e4b40169fba3626/statemachine.go#L308The problematic lines of code at this point are:
https://github.com/qmuntal/stateless/blob/3ead642f6c11905eb551a5076e4b40169fba3626/statemachine.go#L311 https://github.com/qmuntal/stateless/blob/3ead642f6c11905eb551a5076e4b40169fba3626/statemachine.go#L312 https://github.com/qmuntal/stateless/blob/3ead642f6c11905eb551a5076e4b40169fba3626/statemachine.go#L318
All of these is accessing the
sm.eventQueue
which is being mutated by the callback on line https://github.com/qmuntal/stateless/blob/3ead642f6c11905eb551a5076e4b40169fba3626/statemachine.go#L296Any access to the
sm.eventQueue
must be done within the scope ofsm.firingMutex
to ensure "thread" safety.