Closed yichengq closed 10 years ago
@unihorn Can you provide more details? In which case, there will be two goroutine calling this function?
@unihorn This is a awesome bug catch, but I have to close this. This pull request cannot totally solve the problem.
I have found the root cause of the problem.
Here is why there is a deadlock:
When the leader call removePeer
it is holding the log
lock, since it entry the removePeer
via setCommitIndex
. The leader will send a stop
signal and wait for receiving.
If the peer is actually in function: flush()
, it is also need to acquire the log lock at func p.server.log.getEntriesAfter
.
So a deadlock happens.
@unihorn Can you create an issue for this problem. So we will remember to solve it.
It could happen now if two threads call stopHeartbeat at the same time.
Fix the test error in etcd