hashicorp / raft

Golang implementation of the Raft consensus protocol
Mozilla Public License 2.0
8.15k stars 987 forks source link

Add `transportTimeout` in `r.replicateTo` #610

Open lalalalatt opened 3 weeks ago

lalalalatt commented 3 weeks ago

Recently, I am learning how Raft operate and its implementation. However, in my own test, I found the transportTimeout is needed for r.trans.AppendEntries, otherwise the leader may step-down itself frequently while checking the leader-lease frequently under unreliable network.

https://github.com/hashicorp/raft/blob/185ae2ea91f09c04428c4293641db142b130a631/replication.go#L229-L236

And I think we should drop a request once it takes too long to response. For example

func (rf *Raft) replicateTo(fr *followerReplication, lastLogIndex uint64) (shouldStop bool) {
        start := time.Now()
    errr := nil
    transportTimeout := rf.config().HeartbeatInterval
    appendEntriesCh := make(chan error, 1)
    installSnapshotCh := make(chan error, 1)
START:
    ...

    // send AppendEntries RPC
    start = time.Now()
    rf.goFunc(func() {
        appendEntriesCh <- err := r.trans.AppendEntries(peer.ID, peer.Address, &req, &resp);
    })

    select {    
        case err = <-appendResponseCh:
    case <-time.After(transportTimeout):
        err = ErrTransportTimout
    }

    ...
}

@maintainer Do you guys think this is rational? Would like to hear your thoughts on this issue, Thx!

banks commented 2 weeks ago

Thanks for the issue @lalalalatt, would you be able to explain the issue you were seeing in a little more detail please? I agree that timeouts are generally a good option on requests, however it's a bit more subtle here than your example because most replication happens through pipelineReplicate which is using a pipeline rather than discreet RPC for each append entries. We also have a separate goroutine sending heartbeats over a separate TCP connection in most use-cases (e.g. if you use the built in NetworkTransport) which probably mitigates some issues you might see with a custom transport.

There are also downsides to timeouts, especially implemented as you suggest - they could leak goroutines very fast in some cases or they could artificially disrupt pipelline replication often and cause flapping between pipeline and non-pipeline modes.

There may be benefits too, but we'd need to know a little more detail about what issues are being observed to make a good decision there I think!

lalalalatt commented 2 weeks ago

@banks Thanks for replying this~ Oh😅, I miss the pipelineReplicate, yes it is unblocked request within maxInFlight.

Maybe the reason on my case is the discreet AppendEntries request sent to other members blocked and cause leader lease timeout right after leader step up (heartbeat also blocked by network).

So I guess sending each heartbeat in separate goroutine would make it safer for the leader lease, in addition, limit max inflight num for heartbeats prevent goroutine leak.

However, this introduce additional code complexity and may not intuitive as paper said.

Would like to seek your opinion, thx~