goraft / raft

UNMAINTAINED: A Go implementation of the Raft distributed consensus protocol.
MIT License
2.43k stars 479 forks source link

Avoid unlimited wait in stopHeartbeat #187

Closed philips closed 10 years ago

philips commented 10 years ago

We need to fix a possible unlimited wait in stopHeartbeat. Initial attempt was via https://github.com/goraft/raft/pull/186

Via @xiangli-cmu I have found the root cause of the problem. Here is why there is a deadlock: When the leader call removePeer it is holding the log lock, since it entry the removePeer via setCommitIndex. The leader will send a stop signal and wait for receiving.

If the peer is actually in function: flush(), it is also need to acquire the log lock at func p.server.log.getEntriesAfter.

So a deadlock happens.

xiang90 commented 10 years ago

closed via #195

philips commented 10 years ago

thanks!