willemt / raft

C implementation of the Raft Consensus protocol, BSD licensed
Other
1.13k stars 271 forks source link

Leader goes into infinite send_snapshot loop #91

Open yossigo opened 6 years ago

yossigo commented 6 years ago

When preparing to send AppendEntries, followers that lag behind last_snapshot_idx will not have a snapshot sent instead of AE. However, this state will persist as not sending AE means there's also no way to track their current index.

Currently the way to deal with this is for the application to manually call raft_node_set_next_idx() after snapshot has been installed. I think ideally the library should deal with it, although this may be good enough if documented.

liw commented 6 years ago

It's worth considering the alternative where snapshot chunks are sent, processed, and responded inside the library. That gives Raft term checking, matchIndex/nextIndex updating, election timer updating, etc. (E.g., https://github.com/daos-stack/raft/commit/67db859988010c4986af60430e8d6a3e9e44f4d8#diff-857b5d8f957ff267859c82da4461e650R781)

yossigo commented 6 years ago

@liw It's not always the case where it's possible or correct to serialize the FSM state and delivery it as chunks through the library, so I don't think this should completely replace the current load snapshot mechanism.

willemt commented 6 years ago

Originally I had the library manage the sending of the snapshot. I felt the library ended up prescribing the snapshot's transport mechanism too much, which is a no no for this project.

This situation is probably when it makes more sense for the library to be responsible for sending snapshots.

However, I think we just need to add a new function for the user to call once the snapshot has been delivered, something like:

int raft_confirm_delivered_snapshot(raft_t*, raft_node_t*);

This function might be useful for blocking raft_begin_snapshot while there's a snapshot being sent.

liw commented 6 years ago

The installsnapshot messages don't prescribe if or how a snapshot is chunked, or how it is transferred. E.g., a caller may choose to transfer a snapshot into a file on a follower and send an installsnapshot message plus the path to the file. Another caller may want to send a snapshot in chunks but transfer only an RDMA descriptor with the installsnapshot message to a follower. If a caller would like to have multiple chunks in flight, he may want to transfer sequence and other info with each installsnapshot message.