willemt / raft

C implementation of the Raft Consensus protocol, BSD licensed
Other
1.13k stars 271 forks source link

AE error response not properly handled. #97

Closed yossigo closed 6 years ago

yossigo commented 6 years ago

Errors were ignored if node's match_idx==next_idx-1, I assume (??) as a precaution against silently feeding a node that is not monotonic.

However this does not consider the case where send_appendentries() relies on snapshot_last_idx which may be greater than the node's index.

@willemt can you think of unwanted side effects for this?

willemt commented 6 years ago

This is a good fix. It matches the semantics of the raft paper better. Confirmed in virtraft that nothing breaks.

tangruize commented 3 years ago

Hello! I found this fix may cause raft_recv_appendentries_response() to send an empty retry appendentries. The main reason is that stale msgs are not all filtered out.

Consider this trace in a 3-server cluster: S1 becomes Leader (term 1) -> S1 sends heartbeat msg m1 to S2 -> S1 restarts and become Follower -> S1 times out and starts a new election (term 2) -> S2 votes for S1 and updates term to 2 -> S1 becomes Leader again (term 2) -> S2 receives msg m1 and replies msg m2: [term:2, success: false, current_index: 0] because of smaller term -> S1 receives msg m2, but m2 is not filtered out as a stale msg, and S1 replies an empty retry appendentries.

However, this is not even a bug. I just think it is inconsistent with the semantics that the program wants to express.