goraft / raft

UNMAINTAINED: A Go implementation of the Raft distributed consensus protocol.
MIT License
2.43k stars 479 forks source link

fix(peer/heartbeat): avoid unlimited wait on stopHeartbeat #186

Closed yichengq closed 10 years ago

yichengq commented 10 years ago

It could happen now if two threads call stopHeartbeat at the same time.

Fix the test error in etcd

xiang90 commented 10 years ago

@unihorn Can you provide more details? In which case, there will be two goroutine calling this function?

xiang90 commented 10 years ago

@unihorn This is a awesome bug catch, but I have to close this. This pull request cannot totally solve the problem. I have found the root cause of the problem. Here is why there is a deadlock: When the leader call removePeer it is holding the log lock, since it entry the removePeer via setCommitIndex. The leader will send a stop signal and wait for receiving.

If the peer is actually in function: flush(), it is also need to acquire the log lock at func p.server.log.getEntriesAfter.

So a deadlock happens.

xiang90 commented 10 years ago

@unihorn Can you create an issue for this problem. So we will remember to solve it.