sofastack / sofa-jraft

A production-grade java implementation of RAFT consensus algorithm.
https://www.sofastack.tech/projects/sofa-jraft/
Apache License 2.0
3.59k stars 1.15k forks source link

index不连续的问题 #860

Closed superleo-cn closed 2 years ago

superleo-cn commented 2 years ago

Your question

每次当leader重启后,通过Iterator#getIndex的索引就会出现一个不连续的情况;如果是follower启动,则没有这个问题,请问这是什么原因呢?

# leader log

2022-07-01 14:19:16.313  INFO    55 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine      : 当前的值为: index=16, data=hello, term=1
2022-07-01 14:19:18.317  INFO    55 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine      : 当前的值为: index=17, data=hello, term=1
2022-07-01 14:19:20.320  INFO    55 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine      : 当前的值为: index=18, data=hello, term=1
2022-07-01 14:19:22.325  INFO    55 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine      : 当前的值为: index=19, data=hello, term=1
2022-07-01 14:19:24.268  INFO    77 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine      : [seq_123] dump snapshot to file /Users/xxx/crypto-base/./data/raft1/snapshot/temp/123.json ok. sha256 = 5a6a5fe47511eeb5e5f87f446972c96fea96c92ddea3633ab10ab7e96aeddc88
2022-07-01 14:19:24.312  INFO   167 --- [ure-Executor-11] c.a.s.j.s.s.local.LocalSnapshotStorage   : Deleting snapshot ./data/raft1/snapshot/snapshot_19.
2022-07-01 14:19:24.313  INFO   231 --- [ure-Executor-11] c.a.s.j.s.s.local.LocalSnapshotStorage   : Renaming ./data/raft1/snapshot/temp to ./data/raft1/snapshot/snapshot_19.
2022-07-01 14:19:24.335  INFO   167 --- [ure-Executor-11] c.a.s.j.s.s.local.LocalSnapshotStorage   : Deleting snapshot ./data/raft1/snapshot/snapshot_14.
2022-07-01 14:19:24.336  INFO    55 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine      : 当前的值为: index=20, data=hello, term=1
2022-07-01 14:19:24.337  INFO   180 --- [ure-Executor-10] com.moonxbt.tfsm.SimpleStateMachine      : snapshot status=Status[OK]
2022-07-01 14:19:24.337  INFO   609 --- [ure-Executor-11] c.a.s.j.storage.impl.RocksDBLogStorage   : Truncated prefix logs in data path: ./data/raft1/log from log index 10 to 15, cost 0 ms.
2022-07-01 14:19:26.334  INFO    55 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine      : 当前的值为: index=21, data=hello, term=1
2022-07-01 14:19:26.342  INFO   188 --- [      Thread-12] com.moonxbt.tfsm.SimpleStateMachine      : 关闭服务。。。
# new leader log

2022-07-01 14:19:12.139  INFO   160 --- [ure-Executor-10] com.moonxbt.tfsm.SimpleStateMachine3     : status=Status[EPERM<1008>: Is not leader.]
2022-07-01 14:19:12.307  INFO    53 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine3     : 当前的值为: index=14, data=hello, term=1
2022-07-01 14:19:14.123  INFO    75 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine3     : [seq_123] dump snapshot to file /xxx./data/raft3/snapshot/temp/123.json ok. sha256 = 3947ffa8da2838c5812ad8e57d98c356c8f3f39e51a807d1b14484e61328974c
2022-07-01 14:19:14.143  INFO   160 --- [ure-Executor-10] com.moonxbt.tfsm.SimpleStateMachine3     : status=Status[EPERM<1008>: Is not leader.]
2022-07-01 14:19:14.167  INFO   167 --- [ure-Executor-11] c.a.s.j.s.s.local.LocalSnapshotStorage   : Deleting snapshot ./data/raft3/snapshot/snapshot_14.
2022-07-01 14:19:14.167  INFO   231 --- [ure-Executor-11] c.a.s.j.s.s.local.LocalSnapshotStorage   : Renaming ./data/raft3/snapshot/temp to ./data/raft3/snapshot/snapshot_14.
2022-07-01 14:19:14.191  INFO   167 --- [ure-Executor-11] c.a.s.j.s.s.local.LocalSnapshotStorage   : Deleting snapshot ./data/raft3/snapshot/snapshot_9.
2022-07-01 14:19:14.197  INFO   177 --- [ure-Executor-10] com.moonxbt.tfsm.SimpleStateMachine3     : snapshot status=Status[OK]
2022-07-01 14:19:14.200  INFO   609 --- [ure-Executor-10] c.a.s.j.storage.impl.RocksDBLogStorage   : Truncated prefix logs in data path: ./data/raft3/log from log index 1 to 10, cost 0 ms.
2022-07-01 14:19:14.412  INFO    53 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine3     : 当前的值为: index=15, data=hello, term=1
2022-07-01 14:19:16.143  INFO   160 --- [ure-Executor-10] com.moonxbt.tfsm.SimpleStateMachine3     : status=Status[EPERM<1008>: Is not leader.]
2022-07-01 14:19:16.407  INFO    53 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine3     : 当前的值为: index=16, data=hello, term=1
2022-07-01 14:19:18.148  INFO   160 --- [ure-Executor-10] com.moonxbt.tfsm.SimpleStateMachine3     : status=Status[EPERM<1008>: Is not leader.]
2022-07-01 14:19:18.403  INFO    53 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine3     : 当前的值为: index=17, data=hello, term=1
2022-07-01 14:19:20.150  INFO   160 --- [ure-Executor-10] com.moonxbt.tfsm.SimpleStateMachine3     : status=Status[EPERM<1008>: Is not leader.]
2022-07-01 14:19:20.400  INFO    53 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine3     : 当前的值为: index=18, data=hello, term=1
2022-07-01 14:19:22.153  INFO   160 --- [ure-Executor-10] com.moonxbt.tfsm.SimpleStateMachine3     : status=Status[EPERM<1008>: Is not leader.]
2022-07-01 14:19:22.391  INFO    53 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine3     : 当前的值为: index=19, data=hello, term=1
2022-07-01 14:19:24.128  INFO    75 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine3     : [seq_123] dump snapshot to file /Users/xxx/./data/raft3/snapshot/temp/123.json ok. sha256 = 5a6a5fe47511eeb5e5f87f446972c96fea96c92ddea3633ab10ab7e96aeddc88
2022-07-01 14:19:24.155  INFO   160 --- [ure-Executor-11] com.moonxbt.tfsm.SimpleStateMachine3     : status=Status[EPERM<1008>: Is not leader.]
2022-07-01 14:19:24.180  INFO   167 --- [ure-Executor-10] c.a.s.j.s.s.local.LocalSnapshotStorage   : Deleting snapshot ./data/raft3/snapshot/snapshot_19.
2022-07-01 14:19:24.181  INFO   231 --- [ure-Executor-10] c.a.s.j.s.s.local.LocalSnapshotStorage   : Renaming ./data/raft3/snapshot/temp to ./data/raft3/snapshot/snapshot_19.
2022-07-01 14:19:24.204  INFO   167 --- [ure-Executor-10] c.a.s.j.s.s.local.LocalSnapshotStorage   : Deleting snapshot ./data/raft3/snapshot/snapshot_14.
2022-07-01 14:19:24.206  INFO   177 --- [ure-Executor-11] com.moonxbt.tfsm.SimpleStateMachine3     : snapshot status=Status[OK]
2022-07-01 14:19:24.207  INFO   609 --- [ure-Executor-10] c.a.s.j.storage.impl.RocksDBLogStorage   : Truncated prefix logs in data path: ./data/raft3/log from log index 10 to 15, cost 0 ms.
2022-07-01 14:19:24.396  INFO    53 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine3     : 当前的值为: index=20, data=hello, term=1
2022-07-01 14:19:26.160  INFO   160 --- [ure-Executor-10] com.moonxbt.tfsm.SimpleStateMachine3     : status=Status[EPERM<1008>: Is not leader.]
2022-07-01 14:19:26.737  INFO   279 --- [tor-10-thread-1] c.a.sofa.jraft.rpc.RpcRequestProcessor   : Destroyed peer request context for simple/PeerPair[localhost:7006 -> localhost:7004]
2022-07-01 14:19:27.676  INFO    84 --- [ler-Disruptor-0] c.a.sofa.jraft.core.StateMachineAdapter  : onStopFollowing: LeaderChangeContext [leaderId=localhost:7004, term=1, status=Status[ERAFTTIMEDOUT<10001>: Lost connection from leader localhost:7004.]].
2022-07-01 14:19:27.676  INFO  2648 --- [ocalhost:7006>0] com.alipay.sofa.jraft.core.NodeImpl      : Node <simple/localhost:7006> term 1 start preVote.
2022-07-01 14:19:27.696 ERROR   157 --- [ocalhost:7006>0] c.a.s.j.rpc.impl.AbstractClientService   : Fail to connect localhost:7004, remoting exception: com.alipay.remoting.exception.RemotingException: Create connection failed. The address is localhost:7004.
2022-07-01 14:19:27.697  WARN  2680 --- [ocalhost:7006>0] com.alipay.sofa.jraft.core.NodeImpl      : Node <simple/localhost:7006> channel init failed, address=localhost:7004.
2022-07-01 14:19:27.703  INFO    50 --- [utor-5-thread-1] .c.ClientServiceConnectionEventProcessor : Peer localhost:7005 is connected
2022-07-01 14:19:27.750  INFO  2602 --- [RPC-Processor-0] com.alipay.sofa.jraft.core.NodeImpl      : Node <simple/localhost:7006> received PreVoteResponse from localhost:7005, term=1, granted=true.
2022-07-01 14:19:27.751  INFO  1125 --- [RPC-Processor-0] com.alipay.sofa.jraft.core.NodeImpl      : Node <simple/localhost:7006> start vote and grant vote self, term=1.
2022-07-01 14:19:27.756 ERROR   157 --- [RPC-Processor-0] c.a.s.j.rpc.impl.AbstractClientService   : Fail to connect localhost:7004, remoting exception: com.alipay.remoting.exception.RemotingException: Create connection failed. The address is localhost:7004.
2022-07-01 14:19:27.756  WARN  1161 --- [RPC-Processor-0] com.alipay.sofa.jraft.core.NodeImpl      : Node <simple/localhost:7006> channel init failed, address=localhost:7004.
2022-07-01 14:19:27.858  INFO   132 --- [RPC-Processor-0] c.a.s.j.s.impl.LocalRaftMetaStorage      : Save raft meta, path=./data/raft3/raft_meta, term=2, votedFor=localhost:7006, cost time=101 ms
2022-07-01 14:19:27.961  INFO  1222 --- [RPC-Processor-1] com.alipay.sofa.jraft.core.NodeImpl      : Node <simple/localhost:7006> become leader of group, term=2, conf=localhost:7004,localhost:7005,localhost:7006, oldConf=.
2022-07-01 14:19:28.005 ERROR   157 --- [RPC-Processor-1] c.a.s.j.rpc.impl.AbstractClientService   : Fail to connect localhost:7004, remoting exception: com.alipay.remoting.exception.RemotingException: Create connection failed. The address is localhost:7004.
2022-07-01 14:19:28.006 ERROR   887 --- [RPC-Processor-1] com.alipay.sofa.jraft.core.Replicator    : Fail to init sending channel to localhost:7004.
2022-07-01 14:19:28.006 ERROR   132 --- [RPC-Processor-1] c.a.sofa.jraft.core.ReplicatorGroupImpl  : Fail to start replicator to peer=localhost:7004, replicatorType=Follower.
2022-07-01 14:19:28.008 ERROR  1236 --- [RPC-Processor-1] com.alipay.sofa.jraft.core.NodeImpl      : Fail to add a replicator, peer=localhost:7004.
2022-07-01 14:19:28.010  INFO   908 --- [RPC-Processor-1] com.alipay.sofa.jraft.core.Replicator    : Replicator=Replicator [state=Created, statInfo=<running=null, firstLogIndex=0, lastLogIncluded=0, lastLogIndex=0, lastTermIncluded=0>, peerId=localhost:7005, type=Follower]@localhost:7005 is started
2022-07-01 14:19:28.043  INFO    53 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine3     : 当前的值为: index=21, data=hello, term=1
2022-07-01 14:19:28.047  INFO    62 --- [ler-Disruptor-0] c.a.sofa.jraft.core.StateMachineAdapter  : onLeaderStart: term=2.
2022-07-01 14:19:28.169  INFO    53 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine3     : 当前的值为: index=23, data=hello, term=2
2022-07-01 14:19:28.527 ERROR   125 --- [ocalhost:7006>0] c.a.sofa.jraft.core.ReplicatorGroupImpl  : Fail to check replicator connection to peer=localhost:7004, replicatorType=Follower.
2022-07-01 14:19:29.029 ERROR   125 --- [ocalhost:7006>0] c.a.sofa.jraft.core.ReplicatorGroupImpl  : Fail to check replicator connection to peer=localhost:7004, replicatorType=Follower.
2022-07-01 14:19:29.533 ERROR   125 --- [ocalhost:7006>0] c.a.sofa.jraft.core.ReplicatorGroupImpl  : Fail to check replicator connection to peer=localhost:7004, replicatorType=Follower.
2022-07-01 14:19:30.036 ERROR   125 --- [ocalhost:7006>0] c.a.sofa.jraft.core.ReplicatorGroupImpl  : Fail to check replicator connection to peer=localhost:7004, replicatorType=Follower.
2022-07-01 14:19:30.170  INFO    53 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine3     : 当前的值为: index=24, data=hello, term=2
2022-07-01 14:19:30.539 ERROR   125 --- [ocalhost:7006>0] c.a.sofa.jraft.core.ReplicatorGroupImpl  : Fail to check replicator connection to peer=localhost:7004, replicatorType=Follower.
2022-07-01 14:19:31.040 ERROR   125 --- [ocalhost:7006>0] c.a.sofa.jraft.core.ReplicatorGroupImpl  : Fail to check replicator connection to peer=localhost:7004, replicatorType=Follower.
2022-07-01 14:19:31.543 ERROR   125 --- [ocalhost:7006>0] c.a.sofa.jraft.core.ReplicatorGroupImpl  : Fail to check replicator connection to peer=localhost:7004, replicatorType=Follower.
2022-07-01 14:19:32.045 ERROR   125 --- [ocalhost:7006>0] c.a.sofa.jraft.core.ReplicatorGroupImpl  : Fail to check replicator connection to peer=localhost:7004, replicatorType=Follower.
2022-07-01 14:19:32.172  INFO    53 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine3     : 当前的值为: index=25, data=hello, term=2

就是说 当前的值为: index=22, data=hello, term=? 这个数据没有了,不知道这是不是正常现象。

Your scenes

Describe your use scenes (why need this feature)

Your advice

Describe the advice or solution you'd like

Environment

killme2008 commented 2 years ago

@superleo-cn 正常的,成为 leader 的节点会提交一条 no-op 日志,state machine 不会感知到。

killme2008 commented 2 years ago

观察的很仔细,点个赞,有兴趣可以去阅读 raft 论文 5.4.2 一节。

superleo-cn commented 2 years ago

观察的很仔细,点个赞,有兴趣可以去阅读 raft 论文 5.4.2 一节。

明白了,多谢。