Closed superleo-cn closed 2 years ago
每次当leader重启后,通过Iterator#getIndex的索引就会出现一个不连续的情况;如果是follower启动,则没有这个问题,请问这是什么原因呢?
# leader log 2022-07-01 14:19:16.313 INFO 55 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine : 当前的值为: index=16, data=hello, term=1 2022-07-01 14:19:18.317 INFO 55 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine : 当前的值为: index=17, data=hello, term=1 2022-07-01 14:19:20.320 INFO 55 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine : 当前的值为: index=18, data=hello, term=1 2022-07-01 14:19:22.325 INFO 55 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine : 当前的值为: index=19, data=hello, term=1 2022-07-01 14:19:24.268 INFO 77 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine : [seq_123] dump snapshot to file /Users/xxx/crypto-base/./data/raft1/snapshot/temp/123.json ok. sha256 = 5a6a5fe47511eeb5e5f87f446972c96fea96c92ddea3633ab10ab7e96aeddc88 2022-07-01 14:19:24.312 INFO 167 --- [ure-Executor-11] c.a.s.j.s.s.local.LocalSnapshotStorage : Deleting snapshot ./data/raft1/snapshot/snapshot_19. 2022-07-01 14:19:24.313 INFO 231 --- [ure-Executor-11] c.a.s.j.s.s.local.LocalSnapshotStorage : Renaming ./data/raft1/snapshot/temp to ./data/raft1/snapshot/snapshot_19. 2022-07-01 14:19:24.335 INFO 167 --- [ure-Executor-11] c.a.s.j.s.s.local.LocalSnapshotStorage : Deleting snapshot ./data/raft1/snapshot/snapshot_14. 2022-07-01 14:19:24.336 INFO 55 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine : 当前的值为: index=20, data=hello, term=1 2022-07-01 14:19:24.337 INFO 180 --- [ure-Executor-10] com.moonxbt.tfsm.SimpleStateMachine : snapshot status=Status[OK] 2022-07-01 14:19:24.337 INFO 609 --- [ure-Executor-11] c.a.s.j.storage.impl.RocksDBLogStorage : Truncated prefix logs in data path: ./data/raft1/log from log index 10 to 15, cost 0 ms. 2022-07-01 14:19:26.334 INFO 55 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine : 当前的值为: index=21, data=hello, term=1 2022-07-01 14:19:26.342 INFO 188 --- [ Thread-12] com.moonxbt.tfsm.SimpleStateMachine : 关闭服务。。。
# new leader log 2022-07-01 14:19:12.139 INFO 160 --- [ure-Executor-10] com.moonxbt.tfsm.SimpleStateMachine3 : status=Status[EPERM<1008>: Is not leader.] 2022-07-01 14:19:12.307 INFO 53 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine3 : 当前的值为: index=14, data=hello, term=1 2022-07-01 14:19:14.123 INFO 75 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine3 : [seq_123] dump snapshot to file /xxx./data/raft3/snapshot/temp/123.json ok. sha256 = 3947ffa8da2838c5812ad8e57d98c356c8f3f39e51a807d1b14484e61328974c 2022-07-01 14:19:14.143 INFO 160 --- [ure-Executor-10] com.moonxbt.tfsm.SimpleStateMachine3 : status=Status[EPERM<1008>: Is not leader.] 2022-07-01 14:19:14.167 INFO 167 --- [ure-Executor-11] c.a.s.j.s.s.local.LocalSnapshotStorage : Deleting snapshot ./data/raft3/snapshot/snapshot_14. 2022-07-01 14:19:14.167 INFO 231 --- [ure-Executor-11] c.a.s.j.s.s.local.LocalSnapshotStorage : Renaming ./data/raft3/snapshot/temp to ./data/raft3/snapshot/snapshot_14. 2022-07-01 14:19:14.191 INFO 167 --- [ure-Executor-11] c.a.s.j.s.s.local.LocalSnapshotStorage : Deleting snapshot ./data/raft3/snapshot/snapshot_9. 2022-07-01 14:19:14.197 INFO 177 --- [ure-Executor-10] com.moonxbt.tfsm.SimpleStateMachine3 : snapshot status=Status[OK] 2022-07-01 14:19:14.200 INFO 609 --- [ure-Executor-10] c.a.s.j.storage.impl.RocksDBLogStorage : Truncated prefix logs in data path: ./data/raft3/log from log index 1 to 10, cost 0 ms. 2022-07-01 14:19:14.412 INFO 53 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine3 : 当前的值为: index=15, data=hello, term=1 2022-07-01 14:19:16.143 INFO 160 --- [ure-Executor-10] com.moonxbt.tfsm.SimpleStateMachine3 : status=Status[EPERM<1008>: Is not leader.] 2022-07-01 14:19:16.407 INFO 53 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine3 : 当前的值为: index=16, data=hello, term=1 2022-07-01 14:19:18.148 INFO 160 --- [ure-Executor-10] com.moonxbt.tfsm.SimpleStateMachine3 : status=Status[EPERM<1008>: Is not leader.] 2022-07-01 14:19:18.403 INFO 53 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine3 : 当前的值为: index=17, data=hello, term=1 2022-07-01 14:19:20.150 INFO 160 --- [ure-Executor-10] com.moonxbt.tfsm.SimpleStateMachine3 : status=Status[EPERM<1008>: Is not leader.] 2022-07-01 14:19:20.400 INFO 53 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine3 : 当前的值为: index=18, data=hello, term=1 2022-07-01 14:19:22.153 INFO 160 --- [ure-Executor-10] com.moonxbt.tfsm.SimpleStateMachine3 : status=Status[EPERM<1008>: Is not leader.] 2022-07-01 14:19:22.391 INFO 53 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine3 : 当前的值为: index=19, data=hello, term=1 2022-07-01 14:19:24.128 INFO 75 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine3 : [seq_123] dump snapshot to file /Users/xxx/./data/raft3/snapshot/temp/123.json ok. sha256 = 5a6a5fe47511eeb5e5f87f446972c96fea96c92ddea3633ab10ab7e96aeddc88 2022-07-01 14:19:24.155 INFO 160 --- [ure-Executor-11] com.moonxbt.tfsm.SimpleStateMachine3 : status=Status[EPERM<1008>: Is not leader.] 2022-07-01 14:19:24.180 INFO 167 --- [ure-Executor-10] c.a.s.j.s.s.local.LocalSnapshotStorage : Deleting snapshot ./data/raft3/snapshot/snapshot_19. 2022-07-01 14:19:24.181 INFO 231 --- [ure-Executor-10] c.a.s.j.s.s.local.LocalSnapshotStorage : Renaming ./data/raft3/snapshot/temp to ./data/raft3/snapshot/snapshot_19. 2022-07-01 14:19:24.204 INFO 167 --- [ure-Executor-10] c.a.s.j.s.s.local.LocalSnapshotStorage : Deleting snapshot ./data/raft3/snapshot/snapshot_14. 2022-07-01 14:19:24.206 INFO 177 --- [ure-Executor-11] com.moonxbt.tfsm.SimpleStateMachine3 : snapshot status=Status[OK] 2022-07-01 14:19:24.207 INFO 609 --- [ure-Executor-10] c.a.s.j.storage.impl.RocksDBLogStorage : Truncated prefix logs in data path: ./data/raft3/log from log index 10 to 15, cost 0 ms. 2022-07-01 14:19:24.396 INFO 53 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine3 : 当前的值为: index=20, data=hello, term=1 2022-07-01 14:19:26.160 INFO 160 --- [ure-Executor-10] com.moonxbt.tfsm.SimpleStateMachine3 : status=Status[EPERM<1008>: Is not leader.] 2022-07-01 14:19:26.737 INFO 279 --- [tor-10-thread-1] c.a.sofa.jraft.rpc.RpcRequestProcessor : Destroyed peer request context for simple/PeerPair[localhost:7006 -> localhost:7004] 2022-07-01 14:19:27.676 INFO 84 --- [ler-Disruptor-0] c.a.sofa.jraft.core.StateMachineAdapter : onStopFollowing: LeaderChangeContext [leaderId=localhost:7004, term=1, status=Status[ERAFTTIMEDOUT<10001>: Lost connection from leader localhost:7004.]]. 2022-07-01 14:19:27.676 INFO 2648 --- [ocalhost:7006>0] com.alipay.sofa.jraft.core.NodeImpl : Node <simple/localhost:7006> term 1 start preVote. 2022-07-01 14:19:27.696 ERROR 157 --- [ocalhost:7006>0] c.a.s.j.rpc.impl.AbstractClientService : Fail to connect localhost:7004, remoting exception: com.alipay.remoting.exception.RemotingException: Create connection failed. The address is localhost:7004. 2022-07-01 14:19:27.697 WARN 2680 --- [ocalhost:7006>0] com.alipay.sofa.jraft.core.NodeImpl : Node <simple/localhost:7006> channel init failed, address=localhost:7004. 2022-07-01 14:19:27.703 INFO 50 --- [utor-5-thread-1] .c.ClientServiceConnectionEventProcessor : Peer localhost:7005 is connected 2022-07-01 14:19:27.750 INFO 2602 --- [RPC-Processor-0] com.alipay.sofa.jraft.core.NodeImpl : Node <simple/localhost:7006> received PreVoteResponse from localhost:7005, term=1, granted=true. 2022-07-01 14:19:27.751 INFO 1125 --- [RPC-Processor-0] com.alipay.sofa.jraft.core.NodeImpl : Node <simple/localhost:7006> start vote and grant vote self, term=1. 2022-07-01 14:19:27.756 ERROR 157 --- [RPC-Processor-0] c.a.s.j.rpc.impl.AbstractClientService : Fail to connect localhost:7004, remoting exception: com.alipay.remoting.exception.RemotingException: Create connection failed. The address is localhost:7004. 2022-07-01 14:19:27.756 WARN 1161 --- [RPC-Processor-0] com.alipay.sofa.jraft.core.NodeImpl : Node <simple/localhost:7006> channel init failed, address=localhost:7004. 2022-07-01 14:19:27.858 INFO 132 --- [RPC-Processor-0] c.a.s.j.s.impl.LocalRaftMetaStorage : Save raft meta, path=./data/raft3/raft_meta, term=2, votedFor=localhost:7006, cost time=101 ms 2022-07-01 14:19:27.961 INFO 1222 --- [RPC-Processor-1] com.alipay.sofa.jraft.core.NodeImpl : Node <simple/localhost:7006> become leader of group, term=2, conf=localhost:7004,localhost:7005,localhost:7006, oldConf=. 2022-07-01 14:19:28.005 ERROR 157 --- [RPC-Processor-1] c.a.s.j.rpc.impl.AbstractClientService : Fail to connect localhost:7004, remoting exception: com.alipay.remoting.exception.RemotingException: Create connection failed. The address is localhost:7004. 2022-07-01 14:19:28.006 ERROR 887 --- [RPC-Processor-1] com.alipay.sofa.jraft.core.Replicator : Fail to init sending channel to localhost:7004. 2022-07-01 14:19:28.006 ERROR 132 --- [RPC-Processor-1] c.a.sofa.jraft.core.ReplicatorGroupImpl : Fail to start replicator to peer=localhost:7004, replicatorType=Follower. 2022-07-01 14:19:28.008 ERROR 1236 --- [RPC-Processor-1] com.alipay.sofa.jraft.core.NodeImpl : Fail to add a replicator, peer=localhost:7004. 2022-07-01 14:19:28.010 INFO 908 --- [RPC-Processor-1] com.alipay.sofa.jraft.core.Replicator : Replicator=Replicator [state=Created, statInfo=<running=null, firstLogIndex=0, lastLogIncluded=0, lastLogIndex=0, lastTermIncluded=0>, peerId=localhost:7005, type=Follower]@localhost:7005 is started 2022-07-01 14:19:28.043 INFO 53 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine3 : 当前的值为: index=21, data=hello, term=1 2022-07-01 14:19:28.047 INFO 62 --- [ler-Disruptor-0] c.a.sofa.jraft.core.StateMachineAdapter : onLeaderStart: term=2. 2022-07-01 14:19:28.169 INFO 53 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine3 : 当前的值为: index=23, data=hello, term=2 2022-07-01 14:19:28.527 ERROR 125 --- [ocalhost:7006>0] c.a.sofa.jraft.core.ReplicatorGroupImpl : Fail to check replicator connection to peer=localhost:7004, replicatorType=Follower. 2022-07-01 14:19:29.029 ERROR 125 --- [ocalhost:7006>0] c.a.sofa.jraft.core.ReplicatorGroupImpl : Fail to check replicator connection to peer=localhost:7004, replicatorType=Follower. 2022-07-01 14:19:29.533 ERROR 125 --- [ocalhost:7006>0] c.a.sofa.jraft.core.ReplicatorGroupImpl : Fail to check replicator connection to peer=localhost:7004, replicatorType=Follower. 2022-07-01 14:19:30.036 ERROR 125 --- [ocalhost:7006>0] c.a.sofa.jraft.core.ReplicatorGroupImpl : Fail to check replicator connection to peer=localhost:7004, replicatorType=Follower. 2022-07-01 14:19:30.170 INFO 53 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine3 : 当前的值为: index=24, data=hello, term=2 2022-07-01 14:19:30.539 ERROR 125 --- [ocalhost:7006>0] c.a.sofa.jraft.core.ReplicatorGroupImpl : Fail to check replicator connection to peer=localhost:7004, replicatorType=Follower. 2022-07-01 14:19:31.040 ERROR 125 --- [ocalhost:7006>0] c.a.sofa.jraft.core.ReplicatorGroupImpl : Fail to check replicator connection to peer=localhost:7004, replicatorType=Follower. 2022-07-01 14:19:31.543 ERROR 125 --- [ocalhost:7006>0] c.a.sofa.jraft.core.ReplicatorGroupImpl : Fail to check replicator connection to peer=localhost:7004, replicatorType=Follower. 2022-07-01 14:19:32.045 ERROR 125 --- [ocalhost:7006>0] c.a.sofa.jraft.core.ReplicatorGroupImpl : Fail to check replicator connection to peer=localhost:7004, replicatorType=Follower. 2022-07-01 14:19:32.172 INFO 53 --- [ler-Disruptor-0] com.moonxbt.tfsm.SimpleStateMachine3 : 当前的值为: index=25, data=hello, term=2
就是说 当前的值为: index=22, data=hello, term=? 这个数据没有了,不知道这是不是正常现象。
当前的值为: index=22, data=hello, term=?
Describe your use scenes (why need this feature)
Describe the advice or solution you'd like
java -version
uname -a
@superleo-cn 正常的,成为 leader 的节点会提交一条 no-op 日志,state machine 不会感知到。
观察的很仔细,点个赞,有兴趣可以去阅读 raft 论文 5.4.2 一节。
明白了,多谢。
Your question
每次当leader重启后,通过Iterator#getIndex的索引就会出现一个不连续的情况;如果是follower启动,则没有这个问题,请问这是什么原因呢?
就是说
当前的值为: index=22, data=hello, term=?
这个数据没有了,不知道这是不是正常现象。Your scenes
Describe your use scenes (why need this feature)
Your advice
Describe the advice or solution you'd like
Environment
java -version
):uname -a
):