tikv / raft-rs

Raft distributed consensus algorithm implemented in Rust.
Apache License 2.0
2.93k stars 394 forks source link

Why is MsgSnapStatus considered a local message? #502

Open gmoshkin opened 1 year ago

gmoshkin commented 1 year ago

I was wondering, why raft-rs doesn't support sending snapshot status messages via the buit-in message channel. The doc-comments state that the application must report back to the leader about the snapshot status, but there doesn't seem to be a way to do it by means of the provided api. Looks like I have to implement my own separate way to pass a message from a follower to the leader that the snapshot was processed (or failed to be processed).

If I try sending a message with type MsgSnapStatus to the leader, it will be ignored because it is considered a "local" message: https://github.com/tikv/raft-rs/blob/master/src/raw_node.rs#L402-L411

So my question is, what is the reason for MsgSnapStatus being a local message? Why can't it be sent from a follower to the leader?

BusyJay commented 1 year ago

the application must report back to the leader about the snapshot status

raft-rs doesn't control how the snapshot is sent. Instead, it's up to application to choose the best way to send out a snapshot (via a different connection than the log replication, for example). The flow control in raft-rs will stop sending any logs to follower as follower can't process any of them until snapshot is restored. So application needs to tell raft-rs explicitly the snapshot is sent by calling report_snapshot. After that, raft-rs will start detect if follower has fully processed the snapshot and then decide sending a snapshot again or sending logs.

gmoshkin commented 1 year ago

Thanks for the reply!

So am I correct to understand that the only reason RawNode::step rejects the MsgSnapReport is because you want the snapshot to be handled more explicitly than any other message? And RawNode::report_snapshot must be called instead of step to make it more explicit, that snapshot applying is a special operation?

BusyJay commented 1 year ago

It's not just about explicitly. Raft message should be a blackbox to application, any fields inside the messages are implementation details. is_local_message is just a helper to detect programming errors.