sofastack / sofa-jraft

A production-grade java implementation of RAFT consensus algorithm.
https://www.sofastack.tech/projects/sofa-jraft/
Apache License 2.0
3.57k stars 1.14k forks source link

Sometimes voting ping-pong in leader election #583

Closed Excpt0r closed 3 years ago

Excpt0r commented 3 years ago

Hi,

@killme2008 this is a follow up of #552 which I was not able to re-open. I have set up an automated test, to reproduce the rare problem - logs are attached.

Scenario as follows:

There was no load on the system, the purpose of the test is only to check the behaviour of a silently gone node/pod (using virsh destroy on KVM). I'm using the default NodeOptions (electionTimeoutMs = 1000, electionHeartbeatFactor = 10) Thanks for your help

Logs: 07.05.2021_van-device-sub-0_jraft.log 07.05.2021_van-device-sub-2_jraft.log

killme2008 commented 3 years ago

@Excpt0r thank you for the report, we will look into it.

killme2008 commented 3 years ago

Please attach the metrics log in these two nodes, you can get the metric logs by kill -3 <pid>.

It seems that these nodes processed RequestVote request too slow(greater than electionTimeoutMs), and the candidate node reached vote timedout everytime.

killme2008 commented 3 years ago

hi, i think i found the root cause.

First, we look into such case:

| 2021-05-07T14:29:39,166 | JRaft-RPC-Processor-0 | com.alipay.sofa.jraft.core.NodeImpl | INFO | Node 
<DeviceSubLeaderElection/van-device-sub-2.van-device-sub-headless:8888> start vote and grant vote self, term=1.
| 2021-05-07T14:29:39,166 | JRaft-RPC-Processor-0 | com.alipay.sofa.jraft.core.NodeImpl | DEBUG | Node 
<DeviceSubLeaderElection/van-device-sub-2.van-device-sub-headless:8888> stop election timer, term=1.
| 2021-05-07T14:29:39,166 | JRaft-RPC-Processor-0 | com.alipay.sofa.jraft.core.NodeImpl | DEBUG | Node 
<DeviceSubLeaderElection/van-device-sub-2.van-device-sub-headless:8888> start vote timer, term=2 .

1. van-device-sub-2.van-device try to vote self at 14:29:39,166, but van-device-sub-0.van-device receive the vote request at 14:29:40,219:

| 2021-05-07T14:29:40,219 | Bolt-default-executor-6-thread-6 | com.alipay.sofa.jraft.core.NodeImpl | INFO | Node 
<DeviceSubLeaderElection/van-device-sub-0.van-device-sub-headless:8888> received RequestVoteRequest from 
van-device-sub-2.van-device-sub-headless:8888, term=2, currTerm=2.

2. There was almost one second delay for this request to reach van-device-sub-0.van-device.

The problem is creating connection to dead peer van-device-sub-1.van-device timed out in van-device-sub-2.van-device:

| 2021-05-07T14:29:40,174 | JRaft-RPC-Processor-0 | com.alipay.sofa.jraft.rpc.impl.AbstractClientService | ERROR | Fail to 
connect van-device-sub-1.van-device-sub-headless:8888, remoting exception: 
com.alipay.remoting.exception.RemotingException: Create connection failed. The address is van-device-sub-1.van-device-
sub-headless:8888.

The default connect timeout is one second (in RpcOptions#rpcConnectTimeoutMs), and when van-device-sub-2.van-device tried to connect the dead node, it cost 1 second and reached out the vote timeout which is election timeout plus some random value of RaftOptions#maxElectionDelayMs, so the candidate van-device-sub-2.van-device stepped down and started the election again.

I think you can change the default election timeout(in NodeOptioins) or rpc connect timeout(inRpcOptions`) value to solve this problem. We may change these default values in next release.

fengjiachun commented 3 years ago

The problem is creating connection to dead peer van-device-sub-1.van-device timed out in van-device-sub-2.van-device

We can check the connection first and fail-fast (asynchronously to create the connection) on pre-vote and vote