sofastack / sofa-jraft

A production-grade java implementation of RAFT consensus algorithm.
https://www.sofastack.tech/projects/sofa-jraft/
Apache License 2.0
3.53k stars 1.12k forks source link

com.alipay.sofa.jraft.rhea.errors.LeaderNotAvailableException: The leader is not available. #298

Closed zhangjun050754 closed 4 years ago

zhangjun050754 commented 4 years ago

CompletableFuture<List> scan = rheaKVStore.scan(startKey, endKey, readOnlySafe, returnValue); 并发查询会报错。。

Caused by: com.alipay.sofa.jraft.rhea.errors.LeaderNotAvailableException: The leader is not available. java.util.concurrent.ExecutionException: com.alipay.sofa.jraft.rhea.errors.LeaderNotAvailableException: The leader is not available. at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) at cn.recallcode.iot.mqtt.server.store.cache.SubscribeNotWildcardCache.searchFutureList(SubscribeNotWildcardCache.java:78) at cn.recallcode.iot.mqtt.server.store.service.impl.SubscribeStoreService.sendPublishMessage(SubscribeStoreService.java:108) at cn.recallcode.iot.mqtt.server.broker.protocol.Publish.sendPublishMessage(Publish.java:111) at cn.recallcode.iot.mqtt.server.broker.protocol.Publish.processPublish(Publish.java:65) at cn.recallcode.iot.mqtt.server.broker.handler.BrokerHandler.channelRead0(BrokerHandler.java:44) at cn.recallcode.iot.mqtt.server.broker.handler.BrokerHandler.channelRead0(BrokerHandler.java:1) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) at io.netty.handler.codec.MessageToMessageCodec.channelRead(MessageToMessageCodec.java:111) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) at io.netty.handler.codec.http.websocketx.Utf8FrameValidator.channelRead(Utf8FrameValidator.java:77) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.handler.codec.http.websocketx.WebSocketServerProtocolHandler$1.channelRead(WebSocketServerProtocolHandler.java:211) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:284) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:628) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:563) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748) Caused by: com.alipay.sofa.jraft.rhea.errors.LeaderNotAvailableException: The leader is not available.

fengjiachun commented 4 years ago

服务端切主了,要看下服务端的日志 是不是 1.2.6 版本? 是的话, 使用 kill -s SIGUSR2 pid 可看到节点状态和性能指标统计数据,可以参考下文档: https://www.sofastack.tech/projects/sofa-jraft/jraft-user-guide/ 章节:10. 排查故障工具

zhangjun050754 commented 4 years ago

服务端切主了,要看下服务端的日志 是不是 1.2.6 版本? 是的话, 使用 kill -s SIGUSR2 pid 可看到节点状态和性能指标统计数据,可以参考下文档: https://www.sofastack.tech/projects/sofa-jraft/jraft-user-guide/ 章节:10. 排查故障工具

服务端什么情况下会切主? 测试并发scan查询的时候就会报错。。。 数据量也就5万

zhangjun050754 commented 4 years ago

一个应用里面启动了三个节点 想先测试用的 跟这个有关系吗 scan查询几十次主节点就挂了? 然后后面查询的响应速度极慢。。 10-16 18:46:44.873 INFO [com.alipay.sofa.jraft.core.StateMachineAdapter] - onStopFollowing: LeaderChangeContext [leaderId=127.0.0.1:7183, term=204, status=Status[ERAFTTIMEDOUT<10001>: Lost connection from leader 127.0.0.1:7183.]] 10-16 18:46:44.874 INFO [com.alipay.sofa.jraft.core.NodeImpl] - Node <rhea_mqtt--1/127.0.0.1:7182> term 204 start preVote. 10-16 18:46:44.874 INFO [com.alipay.sofa.jraft.core.StateMachineAdapter] - onStopFollowing: LeaderChangeContext [leaderId=127.0.0.1:7181, term=208, status=Status[ERAFTTIMEDOUT<10001>: Lost connection from leader 127.0.0.1:7181.]] 10-16 18:46:44.873 INFO [com.alipay.sofa.jraft.core.NodeImpl] - Node <rhea_mqtt--1/127.0.0.1:7181> term 204 start preVote. 10-16 18:46:44.874 INFO [com.alipay.sofa.jraft.core.StateMachineAdapter] - onStopFollowing: LeaderChangeContext [leaderId=127.0.0.1:7183, term=204, status=Status[ERAFTTIMEDOUT<10001>: Lost connection from leader 127.0.0.1:7183.]] 10-16 18:46:44.874 WARN [com.alipay.sofa.jraft.core.NodeImpl] - Node <mqttRpcGroup/127.0.0.1:7181> steps down when alive nodes don't satisfy quorum, term=208, deadNodes=127.0.0.1:7182,127.0.0.1:7183, conf=127.0.0.1:7181,127.0.0.1:7182,127.0.0.1:7183. 10-16 18:46:44.873 INFO [com.alipay.sofa.jraft.core.NodeImpl] - Node <mqttRpcGroup/127.0.0.1:7182> term 208 start preVote. 10-16 18:46:44.874 INFO [c.a.s.j.rhea.util.concurrent.NamedThreadFactory] - Creates new Thread[rheakv-raft-state-trigger #3,5,main]. 10-16 18:46:44.874 INFO [c.a.s.j.rhea.util.concurrent.NamedThreadFactory] - Creates new Thread[rheakv-raft-state-trigger #1,5,main]. 10-16 18:46:44.875 INFO [c.a.s.j.rhea.util.concurrent.NamedThreadFactory] - Creates new Thread[rheakv-raft-rpc-executor #8,5,main]. 10-16 18:46:44.875 INFO [com.alipay.sofa.jraft.core.StateMachineAdapter] - onLeaderStop: status=Status[ERAFTTIMEDOUT<10001>: Majority of the group dies: 2/3] 10-16 18:46:44.875 INFO [c.a.s.j.rhea.util.concurrent.NamedThreadFactory] - Creates new Thread[rheakv-raft-rpc-executor #8,5,main]. 10-16 18:46:44.875 INFO [c.a.s.j.rhea.util.concurrent.NamedThreadFactory] - Creates new Thread[rheakv-raft-rpc-executor #8,5,main].

fengjiachun commented 4 years ago

onLeaderStop: status=Status[ERAFTTIMEDOUT<10001>: Majority of the group dies: 2/3]

从这行日志看,总共三个节点,挂掉两个了,半数以上节点挂掉后集群是不可用的,需要看下另外两台为什么挂掉

killme2008 commented 4 years ago

建议观察下 GC 情况

fengjiachun commented 4 years ago

deadNodes=127.0.0.1:7182,127.0.0.1:7183

这两个节点进程是否还在? 在的话 'kill -s SIGUSR2 pid' 看一下并把 jraft 生成的状态信息文件发一下看看 (文件名以 node_describe.log 和 node_metrics.log 以及 rheakv_metrics.log 为前缀,每个进程三个文件)

zhangjun050754 commented 4 years ago

10-17 13:43:57.457 WARN [com.alipay.sofa.jraft.core.NodeImpl] - Node <rhea_mqtt--1/127.0.0.1:7181> PreVote to 127.0.0.1:7182 error: Status[ENOENT<1012>: Peer id not found: 127.0.0.1:7182]. 10-17 13:43:57.466 ERROR [CommonDefault] - Future task has not completed! com.alipay.remoting.util.FutureTaskNotCompleted: null at com.alipay.remoting.util.RunStateRecordedFutureTask.getAfterRun(RunStateRecordedFutureTask.java:49) at com.alipay.remoting.util.FutureTaskUtil.getFutureTaskResult(FutureTaskUtil.java:43) at com.alipay.remoting.DefaultConnectionManager.getConnectionPool(DefaultConnectionManager.java:566) at com.alipay.remoting.DefaultConnectionManager.get(DefaultConnectionManager.java:274) at com.alipay.remoting.rpc.RpcClient.checkConnection(RpcClient.java:796) at com.alipay.sofa.jraft.rpc.impl.AbstractBoltClientService.isConnected(AbstractBoltClientService.java:77) at com.alipay.sofa.jraft.rpc.impl.AbstractBoltClientService.connect(AbstractBoltClientService.java:131) at com.alipay.sofa.jraft.RouteTable.refreshLeader(RouteTable.java:243) at com.alipay.sofa.jraft.rhea.client.pd.AbstractPlacementDriverClient.getLeader(AbstractPlacementDriverClient.java:247) at com.alipay.sofa.jraft.rhea.client.pd.AbstractPlacementDriverClient.getLeader(AbstractPlacementDriverClient.java:229) at com.alipay.sofa.jraft.rhea.client.DefaultRheaKVRpcService.getLeader(DefaultRheaKVRpcService.java:107) at com.alipay.sofa.jraft.rhea.client.DefaultRheaKVStore$1.getLeader(DefaultRheaKVStore.java:254) at com.alipay.sofa.jraft.rhea.client.DefaultRheaKVRpcService.getRpcEndpoint(DefaultRheaKVRpcService.java:117) at com.alipay.sofa.jraft.rhea.client.DefaultRheaKVRpcService.callAsyncWithRpc(DefaultRheaKVRpcService.java:100) at com.alipay.sofa.jraft.rhea.client.DefaultRheaKVRpcService.callAsyncWithRpc(DefaultRheaKVRpcService.java:93) at com.alipay.sofa.jraft.rhea.client.DefaultRheaKVStore.internalPut(DefaultRheaKVStore.java:865) at com.alipay.sofa.jraft.rhea.client.DefaultRheaKVStore.put(DefaultRheaKVStore.java:844) at com.alipay.sofa.jraft.rhea.client.DefaultRheaKVStore.access$800(DefaultRheaKVStore.java:185) at com.alipay.sofa.jraft.rhea.client.DefaultRheaKVStore$PutBatchingHandler.onEvent(DefaultRheaKVStore.java:1603) at com.alipay.sofa.jraft.rhea.client.DefaultRheaKVStore$PutBatchingHandler.onEvent(DefaultRheaKVStore.java:1583) at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137) at java.lang.Thread.run(Thread.java:748)

zhangjun050754 commented 4 years ago

7181: -- rheakv 10/17/19 1:50:35 PM ============================================================

-- rheakv -- Counters -------------------------------------------------------------------- rhea-region-bytes-read-1 count = 212618003 rhea-region-bytes-written-1 count = 9106876 rhea-region-keys-read-1 count = 2801815 rhea-region-keys-written-1 count = 120002

-- rheakv -- Histograms ------------------------------------------------------------------ rhea-st-batch-write_-1 count = 19673 min = 1 max = 70 mean = 3.51 stddev = 6.21 median = 2.00 75% <= 3.00 95% <= 12.00 98% <= 22.00 99% <= 36.00 99.9% <= 70.00 send_batching_get_bytes count = 0 min = 0 max = 0 mean = 0.00 stddev = 0.00 median = 0.00 75% <= 0.00 95% <= 0.00 98% <= 0.00 99% <= 0.00 99.9% <= 0.00 send_batching_get_keys count = 0 min = 0 max = 0 mean = 0.00 stddev = 0.00 median = 0.00 75% <= 0.00 95% <= 0.00 98% <= 0.00 99% <= 0.00 99.9% <= 0.00 send_batching_get_only_safe_bytes count = 0 min = 0 max = 0 mean = 0.00 stddev = 0.00 median = 0.00 75% <= 0.00 95% <= 0.00 98% <= 0.00 99% <= 0.00 99.9% <= 0.00 send_batching_get_only_safe_keys count = 0 min = 0 max = 0 mean = 0.00 stddev = 0.00 median = 0.00 75% <= 0.00 95% <= 0.00 98% <= 0.00 99% <= 0.00 99.9% <= 0.00 send_batching_put_bytes count = 2 min = 149 max = 189 mean = 164.43 stddev = 19.47 median = 149.00 75% <= 189.00 95% <= 189.00 98% <= 189.00 99% <= 189.00 99.9% <= 189.00 send_batching_put_keys count = 2 min = 1 max = 1 mean = 1.00 stddev = 0.00 median = 1.00 75% <= 1.00 95% <= 1.00 98% <= 1.00 99% <= 1.00 99.9% <= 1.00

-- rheakv -- Meters ---------------------------------------------------------------------- rhea-st-apply-qps-1 count = 69059 mean rate = 166.30 events/second 1-minute rate = 7.43 events/second 5-minute rate = 83.59 events/second 15-minute rate = 54.73 events/second rhea-st-apply-qps-1PUT count = 63767 mean rate = 159.53 events/second 1-minute rate = 7.47 events/second 5-minute rate = 78.61 events/second 15-minute rate = 50.96 events/second rhea-st-apply-qps-1_PUTLIST count = 5291 mean rate = 15.58 events/second 1-minute rate = 1.21 events/second 5-minute rate = 66.24 events/second 15-minute rate = 130.42 events/second rhea-st-apply-qps-1_SCAN count = 1 mean rate = 0.00 events/second 1-minute rate = 0.00 events/second 5-minute rate = 0.05 events/second 15-minute rate = 0.13 events/second

-- rheakv -- Timers ---------------------------------------------------------------------- rhea-db-timer_BATCH_PUT count = 10221 mean rate = 30.10 calls/second 1-minute rate = 1.39 calls/second 5-minute rate = 28.39 calls/second 15-minute rate = 41.37 calls/second min = 0.01 milliseconds max = 0.39 milliseconds mean = 0.06 milliseconds stddev = 0.05 milliseconds median = 0.05 milliseconds 75% <= 0.07 milliseconds 95% <= 0.14 milliseconds 98% <= 0.24 milliseconds 99% <= 0.29 milliseconds 99.9% <= 0.39 milliseconds rhea-db-timer_PUT count = 5291 mean rate = 13.24 calls/second 1-minute rate = 0.61 calls/second 5-minute rate = 6.55 calls/second 15-minute rate = 4.34 calls/second min = 0.01 milliseconds max = 0.16 milliseconds mean = 0.03 milliseconds stddev = 0.02 milliseconds median = 0.03 milliseconds 75% <= 0.04 milliseconds 95% <= 0.06 milliseconds 98% <= 0.07 milliseconds 99% <= 0.08 milliseconds 99.9% <= 0.13 milliseconds rhea-db-timer_PUT_LIST count = 5291 mean rate = 15.58 calls/second 1-minute rate = 1.21 calls/second 5-minute rate = 66.24 calls/second 15-minute rate = 130.42 calls/second min = 0.01 milliseconds max = 1.59 milliseconds mean = 0.05 milliseconds stddev = 0.05 milliseconds median = 0.04 milliseconds 75% <= 0.06 milliseconds 95% <= 0.13 milliseconds 98% <= 0.17 milliseconds 99% <= 0.19 milliseconds 99.9% <= 0.23 milliseconds rhea-db-timerSCAN count = 155 mean rate = 0.39 calls/second 1-minute rate = 0.03 calls/second 5-minute rate = 0.23 calls/second 15-minute rate = 0.25 calls/second min = 0.15 milliseconds max = 245185.71 milliseconds mean = 105981.36 milliseconds stddev = 121408.78 milliseconds median = 92.55 milliseconds 75% <= 245185.71 milliseconds 95% <= 245185.71 milliseconds 98% <= 245185.71 milliseconds 99% <= 245185.71 milliseconds 99.9% <= 245185.71 milliseconds rhea-rpc-request-timer-1 count = 69228 mean rate = 166.78 calls/second 1-minute rate = 7.43 calls/second 5-minute rate = 83.73 calls/second 15-minute rate = 54.85 calls/second min = 1.12 milliseconds max = 245188.83 milliseconds mean = 17798.37 milliseconds stddev = 63592.45 milliseconds median = 8.12 milliseconds 75% <= 15.77 milliseconds 95% <= 245188.83 milliseconds 98% <= 245188.83 milliseconds 99% <= 245188.83 milliseconds 99.9% <= 245188.83 milliseconds rhea-rpc-request-timer_-1PUT count = 63767 mean rate = 159.39 calls/second 1-minute rate = 6.92 calls/second 5-minute rate = 77.41 calls/second 15-minute rate = 50.70 calls/second min = 2.28 milliseconds max = 118.27 milliseconds mean = 11.94 milliseconds stddev = 14.35 milliseconds median = 7.05 milliseconds 75% <= 12.82 milliseconds 95% <= 42.29 milliseconds 98% <= 64.87 milliseconds 99% <= 83.19 milliseconds 99.9% <= 118.27 milliseconds rhea-rpc-request-timer-1_PUTLIST count = 5291 mean rate = 15.58 calls/second 1-minute rate = 1.20 calls/second 5-minute rate = 66.11 calls/second 15-minute rate = 130.15 calls/second min = 2.45 milliseconds max = 112.01 milliseconds mean = 18.36 milliseconds stddev = 17.59 milliseconds median = 11.68 milliseconds 75% <= 26.07 milliseconds 95% <= 54.16 milliseconds 98% <= 65.90 milliseconds 99% <= 74.94 milliseconds 99.9% <= 112.01 milliseconds rhea-rpc-request-timer-1_SCAN count = 170 mean rate = 0.42 calls/second 1-minute rate = 0.03 calls/second 5-minute rate = 0.24 calls/second 15-minute rate = 0.26 calls/second min = 0.64 milliseconds max = 245188.85 milliseconds mean = 94939.56 milliseconds stddev = 119383.54 milliseconds median = 64.28 milliseconds 75% <= 245188.85 milliseconds 95% <= 245188.85 milliseconds 98% <= 245188.85 milliseconds 99% <= 245188.85 milliseconds 99.9% <= 245188.85 milliseconds

nodeId: <mqttRpcGroup/127.0.0.1:7181> state: STATE_FOLLOWER term: 3 conf: ConfigurationEntry [id=LogId [index=2, term=3], conf=127.0.0.1:7181,127.0.0.1:7182,127.0.0.1:7183, oldConf=] electionTimer: RepeatedTimer [timerTask=com.alipay.sofa.jraft.util.RepeatedTimer$1@59a3bf71, stopped=false, running=true, destroyed=false, invoking=false, timeoutMs=1000] voteTimer: RepeatedTimer [timerTask=null, stopped=true, running=false, destroyed=false, invoking=false, timeoutMs=1000] stepDownTimer: RepeatedTimer [timerTask=null, stopped=true, running=false, destroyed=false, invoking=false, timeoutMs=500] snapshotTimer: RepeatedTimer [timerTask=com.alipay.sofa.jraft.util.RepeatedTimer$1@5697375, stopped=false, running=true, destroyed=false, invoking=false, timeoutMs=3600000] logManager: storage: [1, 81] diskId: LogId [index=81, term=3] appliedId: LogId [index=81, term=3] lastSnapshotId: LogId [index=0, term=0] fsmCaller: StateMachine [Idle] ballotBox: lastCommittedIndex: 81 pendingIndex: 0 pendingMetaQueueSize: 0 snapshotExecutor: lastSnapshotTerm: 0 lastSnapshotIndex: 0 term: 3 savingSnapshot: false loadingSnapshot: false stopped: false replicatorGroup: replicators: [] failureReplicators: [] nodeId: <rhea_mqtt--1/127.0.0.1:7181> state: STATE_FOLLOWER term: 2 conf: ConfigurationEntry [id=LogId [index=69061, term=2], conf=127.0.0.1:7181,127.0.0.1:7182,127.0.0.1:7183, oldConf=] electionTimer: RepeatedTimer [timerTask=com.alipay.sofa.jraft.util.RepeatedTimer$1@6830a277, stopped=false, running=true, destroyed=false, invoking=false, timeoutMs=1000] voteTimer: RepeatedTimer [timerTask=null, stopped=true, running=false, destroyed=false, invoking=false, timeoutMs=1000] stepDownTimer: RepeatedTimer [timerTask=null, stopped=true, running=false, destroyed=false, invoking=false, timeoutMs=500] snapshotTimer: RepeatedTimer [timerTask=com.alipay.sofa.jraft.util.RepeatedTimer$1@7d1030da, stopped=false, running=true, destroyed=false, invoking=false, timeoutMs=3600000] logManager: storage: [1, 69061] diskId: LogId [index=69061, term=2] appliedId: LogId [index=69061, term=2] lastSnapshotId: LogId [index=0, term=0] fsmCaller: StateMachine [Idle] ballotBox: lastCommittedIndex: 69061 pendingIndex: 0 pendingMetaQueueSize: 0 snapshotExecutor: lastSnapshotTerm: 0 lastSnapshotIndex: 0 term: 2 savingSnapshot: false loadingSnapshot: false stopped: false replicatorGroup: replicators: [] failureReplicators: []

-- <rhea_mqtt--1/127.0.0.1:7181> 10/17/19 1:50:40 PM ============================================================

-- <rhea_mqtt--1/127.0.0.1:7181> -- Gauges ---------------------------------------------------------------------- jraft-fsm-caller-disruptor.buffer-size value = 16384 jraft-fsm-caller-disruptor.remaining-capacity value = 16384 jraft-log-manager-disruptor.buffer-size value = 16384 jraft-log-manager-disruptor.remaining-capacity value = 16384 jraft-node-impl-disruptor.buffer-size value = 16384 jraft-node-impl-disruptor.remaining-capacity value = 16384 jraft-read-only-service-disruptor.buffer-size value = 16384 jraft-read-only-service-disruptor.remaining-capacity value = 16384 raft-rpc-client-thread-pool.active value = 0 raft-rpc-client-thread-pool.completed value = 22 raft-rpc-client-thread-pool.pool-size value = 22 raft-rpc-client-thread-pool.queued value = 0 raft-utils-closure-thread-pool.active value = 0 raft-utils-closure-thread-pool.completed value = 105881 raft-utils-closure-thread-pool.pool-size value = 4 raft-utils-closure-thread-pool.queued value = 0 replicator-rhea_mqtt--1/127.0.0.1:7182.append-entries-times value = 1 replicator-rhea_mqtt--1/127.0.0.1:7182.heartbeat-times value = 1238 replicator-rhea_mqtt--1/127.0.0.1:7182.install-snapshot-times value = 0 replicator-rhea_mqtt--1/127.0.0.1:7182.log-lags value = 1 replicator-rhea_mqtt--1/127.0.0.1:7182.next-index value = 69061 replicator-rhea_mqtt--1/127.0.0.1:7183.append-entries-times value = 3 replicator-rhea_mqtt--1/127.0.0.1:7183.heartbeat-times value = 1167 replicator-rhea_mqtt--1/127.0.0.1:7183.install-snapshot-times value = 0 replicator-rhea_mqtt--1/127.0.0.1:7183.log-lags value = 1 replicator-rhea_mqtt--1/127.0.0.1:7183.next-index value = 69061

-- <rhea_mqtt--1/127.0.0.1:7181> -- Histograms ------------------------------------------------------------------ append-logs-bytes count = 6673 min = 0 max = 45314 mean = 2264.56 stddev = 3861.17 median = 1008.00 75% <= 1764.00 95% <= 8205.00 98% <= 13077.00 99% <= 21288.00 99.9% <= 45314.00 append-logs-count count = 6673 min = 1 max = 91 mean = 10.36 stddev = 12.55 median = 6.00 75% <= 10.00 95% <= 40.00 98% <= 53.00 99% <= 70.00 99.9% <= 91.00 fsm-apply-tasks-count count = 14830 min = 1 max = 168 mean = 5.08 stddev = 10.68 median = 2.00 75% <= 4.00 95% <= 21.00 98% <= 38.00 99% <= 50.00 99.9% <= 168.00 handle-append-entries-count count = 175 min = 0 max = 1 mean = 0.00 stddev = 0.01 median = 0.00 75% <= 0.00 95% <= 0.00 98% <= 0.00 99% <= 0.00 99.9% <= 0.00 handle-read-index-entries count = 170 min = 1 max = 1 mean = 1.00 stddev = 0.00 median = 1.00 75% <= 1.00 95% <= 1.00 98% <= 1.00 99% <= 1.00 99.9% <= 1.00 replicate-entries-bytes count = 105162 min = 122 max = 34133 mean = 327.68 stddev = 1591.80 median = 133.00 75% <= 133.00 95% <= 630.00 98% <= 1444.00 99% <= 4256.00 99.9% <= 34133.00 replicate-entries-count count = 105162 min = 1 max = 44 mean = 1.36 stddev = 2.39 median = 1.00 75% <= 1.00 95% <= 2.00 98% <= 5.00 99% <= 8.00 99.9% <= 44.00 replicate-inflights-count count = 105166 min = 1 max = 205 mean = 24.12 stddev = 22.81 median = 15.00 75% <= 31.00 95% <= 67.00 98% <= 92.00 99% <= 113.00 99.9% <= 185.00

-- <rhea_mqtt--1/127.0.0.1:7181> -- Timers ---------------------------------------------------------------------- append-logs count = 6673 mean rate = 16.64 calls/second 1-minute rate = 0.71 calls/second 5-minute rate = 8.22 calls/second 15-minute rate = 5.67 calls/second min = 1.00 milliseconds max = 59.00 milliseconds mean = 4.70 milliseconds stddev = 5.63 milliseconds median = 3.00 milliseconds 75% <= 4.00 milliseconds 95% <= 14.00 milliseconds 98% <= 23.00 milliseconds 99% <= 30.00 milliseconds 99.9% <= 50.00 milliseconds fsm-apply-tasks count = 14830 mean rate = 37.01 calls/second 1-minute rate = 1.61 calls/second 5-minute rate = 18.11 calls/second 15-minute rate = 12.02 calls/second min = 0.00 milliseconds max = 2.00 milliseconds mean = 0.08 milliseconds stddev = 0.28 milliseconds median = 0.00 milliseconds 75% <= 0.00 milliseconds 95% <= 1.00 milliseconds 98% <= 1.00 milliseconds 99% <= 1.00 milliseconds 99.9% <= 2.00 milliseconds fsm-commit count = 14831 mean rate = 37.02 calls/second 1-minute rate = 1.61 calls/second 5-minute rate = 18.11 calls/second 15-minute rate = 12.02 calls/second min = 0.00 milliseconds max = 1.00 milliseconds mean = 0.09 milliseconds stddev = 0.29 milliseconds median = 0.00 milliseconds 75% <= 0.00 milliseconds 95% <= 1.00 milliseconds 98% <= 1.00 milliseconds 99% <= 1.00 milliseconds 99.9% <= 1.00 milliseconds fsm-leader-stop count = 1 mean rate = 0.00 calls/second 1-minute rate = 0.00 calls/second 5-minute rate = 0.08 calls/second 15-minute rate = 0.15 calls/second min = 1.00 milliseconds max = 1.00 milliseconds mean = 1.00 milliseconds stddev = 0.00 milliseconds median = 1.00 milliseconds 75% <= 1.00 milliseconds 95% <= 1.00 milliseconds 98% <= 1.00 milliseconds 99% <= 1.00 milliseconds 99.9% <= 1.00 milliseconds fsm-start-following count = 5 mean rate = 0.02 calls/second 1-minute rate = 0.03 calls/second 5-minute rate = 0.09 calls/second 15-minute rate = 0.15 calls/second min = 0.00 milliseconds max = 759.00 milliseconds mean = 8.36 milliseconds stddev = 79.11 milliseconds median = 0.00 milliseconds 75% <= 0.00 milliseconds 95% <= 0.00 milliseconds 98% <= 0.00 milliseconds 99% <= 759.00 milliseconds 99.9% <= 759.00 milliseconds fsm-stop-following count = 4 mean rate = 0.02 calls/second 1-minute rate = 0.04 calls/second 5-minute rate = 0.18 calls/second 15-minute rate = 0.31 calls/second min = 0.00 milliseconds max = 219722.00 milliseconds mean = 107417.59 milliseconds stddev = 109825.62 milliseconds median = 0.00 milliseconds 75% <= 219722.00 milliseconds 95% <= 219722.00 milliseconds 98% <= 219722.00 milliseconds 99% <= 219722.00 milliseconds 99.9% <= 219722.00 milliseconds handle-append-entries count = 175 mean rate = 0.63 calls/second 1-minute rate = 1.00 calls/second 5-minute rate = 2.06 calls/second 15-minute rate = 3.37 calls/second min = 0.00 milliseconds max = 1051.00 milliseconds mean = 7.82 milliseconds stddev = 76.24 milliseconds median = 0.00 milliseconds 75% <= 0.00 milliseconds 95% <= 0.00 milliseconds 98% <= 1.00 milliseconds 99% <= 743.00 milliseconds 99.9% <= 743.00 milliseconds handle-read-index count = 170 mean rate = 0.42 calls/second 1-minute rate = 0.01 calls/second 5-minute rate = 0.24 calls/second 15-minute rate = 0.26 calls/second min = 0.00 milliseconds max = 50.00 milliseconds mean = 0.32 milliseconds stddev = 2.14 milliseconds median = 0.00 milliseconds 75% <= 0.00 milliseconds 95% <= 1.00 milliseconds 98% <= 2.00 milliseconds 99% <= 2.00 milliseconds 99.9% <= 50.00 milliseconds pre-vote count = 11 mean rate = 0.03 calls/second 1-minute rate = 0.00 calls/second 5-minute rate = 0.11 calls/second 15-minute rate = 0.26 calls/second min = 4.00 milliseconds max = 8704.00 milliseconds mean = 6386.67 milliseconds stddev = 2707.91 milliseconds median = 7167.00 milliseconds 75% <= 8682.00 milliseconds 95% <= 8704.00 milliseconds 98% <= 8704.00 milliseconds 99% <= 8704.00 milliseconds 99.9% <= 8704.00 milliseconds read-index count = 170 mean rate = 0.42 calls/second 1-minute rate = 0.01 calls/second 5-minute rate = 0.24 calls/second 15-minute rate = 0.26 calls/second min = 0.00 milliseconds max = 88.00 milliseconds mean = 2.87 milliseconds stddev = 6.58 milliseconds median = 1.00 milliseconds 75% <= 2.00 milliseconds 95% <= 14.00 milliseconds 98% <= 22.00 milliseconds 99% <= 34.00 milliseconds 99.9% <= 88.00 milliseconds replicate-entries count = 105162 mean rate = 262.36 calls/second 1-minute rate = 11.42 calls/second 5-minute rate = 127.75 calls/second 15-minute rate = 83.68 calls/second min = 2.00 milliseconds max = 100.00 milliseconds mean = 11.20 milliseconds stddev = 12.05 milliseconds median = 7.00 milliseconds 75% <= 13.00 milliseconds 95% <= 35.00 milliseconds 98% <= 57.00 milliseconds 99% <= 65.00 milliseconds 99.9% <= 93.00 milliseconds request-vote count = 1 mean rate = 0.00 calls/second 1-minute rate = 0.00 calls/second 5-minute rate = 0.05 calls/second 15-minute rate = 0.13 calls/second min = 140.00 milliseconds max = 140.00 milliseconds mean = 140.00 milliseconds stddev = 0.00 milliseconds median = 140.00 milliseconds 75% <= 140.00 milliseconds 95% <= 140.00 milliseconds 98% <= 140.00 milliseconds 99% <= 140.00 milliseconds 99.9% <= 140.00 milliseconds save-raft-meta count = 3 mean rate = 0.01 calls/second 1-minute rate = 0.00 calls/second 5-minute rate = 0.06 calls/second 15-minute rate = 0.13 calls/second min = 2.00 milliseconds max = 78.00 milliseconds mean = 7.95 milliseconds stddev = 19.55 milliseconds median = 3.00 milliseconds 75% <= 3.00 milliseconds 95% <= 78.00 milliseconds 98% <= 78.00 milliseconds 99% <= 78.00 milliseconds 99.9% <= 78.00 milliseconds

zhangjun050754 commented 4 years ago

7182 -- rheakv 10/17/19 1:48:18 PM ============================================================

-- rheakv -- Histograms ------------------------------------------------------------------ rhea-st-batch-write_-1 count = 12435 min = 1 max = 97 mean = 5.71 stddev = 9.91 median = 2.00 75% <= 6.00 95% <= 22.00 98% <= 40.00 99% <= 56.00 99.9% <= 97.00 send_batching_get_bytes count = 0 min = 0 max = 0 mean = 0.00 stddev = 0.00 median = 0.00 75% <= 0.00 95% <= 0.00 98% <= 0.00 99% <= 0.00 99.9% <= 0.00 send_batching_get_keys count = 0 min = 0 max = 0 mean = 0.00 stddev = 0.00 median = 0.00 75% <= 0.00 95% <= 0.00 98% <= 0.00 99% <= 0.00 99.9% <= 0.00 send_batching_get_only_safe_bytes count = 0 min = 0 max = 0 mean = 0.00 stddev = 0.00 median = 0.00 75% <= 0.00 95% <= 0.00 98% <= 0.00 99% <= 0.00 99.9% <= 0.00 send_batching_get_only_safe_keys count = 0 min = 0 max = 0 mean = 0.00 stddev = 0.00 median = 0.00 75% <= 0.00 95% <= 0.00 98% <= 0.00 99% <= 0.00 99.9% <= 0.00 send_batching_put_bytes count = 0 min = 0 max = 0 mean = 0.00 stddev = 0.00 median = 0.00 75% <= 0.00 95% <= 0.00 98% <= 0.00 99% <= 0.00 99.9% <= 0.00 send_batching_put_keys count = 0 min = 0 max = 0 mean = 0.00 stddev = 0.00 median = 0.00 75% <= 0.00 95% <= 0.00 98% <= 0.00 99% <= 0.00 99.9% <= 0.00

-- rheakv -- Meters ---------------------------------------------------------------------- rhea-st-apply-qps-1 count = 69059 mean rate = 264.77 events/second 1-minute rate = 78.28 events/second 5-minute rate = 134.05 events/second 15-minute rate = 64.33 events/second rhea-st-apply-qps-1PUT count = 63767 mean rate = 246.50 events/second 1-minute rate = 77.01 events/second 5-minute rate = 125.34 events/second 15-minute rate = 59.54 events/second rhea-st-apply-qps-1_PUTLIST count = 5291 mean rate = 26.64 events/second 1-minute rate = 12.45 events/second 5-minute rate = 105.95 events/second 15-minute rate = 152.86 events/second rhea-st-apply-qps-1_SCAN count = 1 mean rate = 0.00 events/second 1-minute rate = 0.00 events/second 5-minute rate = 0.09 events/second 15-minute rate = 0.15 events/second

-- rheakv -- Timers ---------------------------------------------------------------------- rhea-db-timer_BATCH_PUT count = 6972 mean rate = 35.11 calls/second 1-minute rate = 10.16 calls/second 5-minute rate = 38.10 calls/second 15-minute rate = 44.17 calls/second min = 0.00 milliseconds max = 0.25 milliseconds mean = 0.04 milliseconds stddev = 0.02 milliseconds median = 0.03 milliseconds 75% <= 0.04 milliseconds 95% <= 0.09 milliseconds 98% <= 0.11 milliseconds 99% <= 0.13 milliseconds 99.9% <= 0.19 milliseconds rhea-db-timer_PUT count = 1611 mean rate = 6.23 calls/second 1-minute rate = 1.71 calls/second 5-minute rate = 3.16 calls/second 15-minute rate = 1.64 calls/second min = 0.00 milliseconds max = 0.07 milliseconds mean = 0.02 milliseconds stddev = 0.01 milliseconds median = 0.01 milliseconds 75% <= 0.02 milliseconds 95% <= 0.04 milliseconds 98% <= 0.05 milliseconds 99% <= 0.05 milliseconds 99.9% <= 0.07 milliseconds rhea-db-timer_PUT_LIST count = 5291 mean rate = 26.64 calls/second 1-minute rate = 12.45 calls/second 5-minute rate = 105.95 calls/second 15-minute rate = 152.86 calls/second min = 0.00 milliseconds max = 0.20 milliseconds mean = 0.03 milliseconds stddev = 0.03 milliseconds median = 0.01 milliseconds 75% <= 0.03 milliseconds 95% <= 0.08 milliseconds 98% <= 0.11 milliseconds 99% <= 0.12 milliseconds 99.9% <= 0.18 milliseconds rhea-db-timerSCAN count = 1 mean rate = 0.00 calls/second 1-minute rate = 0.00 calls/second 5-minute rate = 0.09 calls/second 15-minute rate = 0.15 calls/second min = 0.19 milliseconds max = 0.19 milliseconds mean = 0.19 milliseconds stddev = 0.00 milliseconds median = 0.19 milliseconds 75% <= 0.19 milliseconds 95% <= 0.19 milliseconds 98% <= 0.19 milliseconds 99% <= 0.19 milliseconds 99.9% <= 0.19 milliseconds rhea-rpc-request-timer-1 count = 0 mean rate = 0.00 calls/second 1-minute rate = 0.00 calls/second 5-minute rate = 0.00 calls/second 15-minute rate = 0.00 calls/second min = 0.00 milliseconds max = 0.00 milliseconds mean = 0.00 milliseconds stddev = 0.00 milliseconds median = 0.00 milliseconds 75% <= 0.00 milliseconds 95% <= 0.00 milliseconds 98% <= 0.00 milliseconds 99% <= 0.00 milliseconds 99.9% <= 0.00 milliseconds

nodeId: <mqttRpcGroup/127.0.0.1:7182> state: STATE_LEADER term: 3 conf: ConfigurationEntry [id=LogId [index=2, term=3], conf=127.0.0.1:7181,127.0.0.1:7182,127.0.0.1:7183, oldConf=] electionTimer: RepeatedTimer [timerTask=null, stopped=true, running=false, destroyed=false, invoking=false, timeoutMs=1000] voteTimer: RepeatedTimer [timerTask=null, stopped=true, running=false, destroyed=false, invoking=false, timeoutMs=1000] stepDownTimer: RepeatedTimer [timerTask=com.alipay.sofa.jraft.util.RepeatedTimer$1@4b97b9d0, stopped=false, running=true, destroyed=false, invoking=false, timeoutMs=500] snapshotTimer: RepeatedTimer [timerTask=com.alipay.sofa.jraft.util.RepeatedTimer$1@e953ec6, stopped=false, running=true, destroyed=false, invoking=false, timeoutMs=3600000] logManager: storage: [1, 81] diskId: LogId [index=81, term=3] appliedId: LogId [index=81, term=3] lastSnapshotId: LogId [index=0, term=0] fsmCaller: StateMachine [Idle] ballotBox: lastCommittedIndex: 81 pendingIndex: 82 pendingMetaQueueSize: 0 snapshotExecutor: lastSnapshotTerm: 0 lastSnapshotIndex: 0 term: 2 savingSnapshot: false loadingSnapshot: false stopped: false replicatorGroup: replicators: [Replicator [state=Probe, statInfo=,peerId=127.0.0.1:7181], Replicator [state=Replicate, statInfo=,peerId=127.0.0.1:7183]] failureReplicators: [] nodeId: <rhea_mqtt--1/127.0.0.1:7182> state: STATE_FOLLOWER term: 2 conf: ConfigurationEntry [id=LogId [index=69061, term=2], conf=127.0.0.1:7181,127.0.0.1:7182,127.0.0.1:7183, oldConf=] electionTimer: RepeatedTimer [timerTask=com.alipay.sofa.jraft.util.RepeatedTimer$1@31d01766, stopped=false, running=true, destroyed=false, invoking=false, timeoutMs=1000] voteTimer: RepeatedTimer [timerTask=null, stopped=true, running=false, destroyed=false, invoking=false, timeoutMs=1000] stepDownTimer: RepeatedTimer [timerTask=null, stopped=true, running=false, destroyed=false, invoking=false, timeoutMs=500] snapshotTimer: RepeatedTimer [timerTask=com.alipay.sofa.jraft.util.RepeatedTimer$1@3fd1d27d, stopped=false, running=true, destroyed=false, invoking=false, timeoutMs=3600000] logManager: storage: [1, 69061] diskId: LogId [index=69061, term=2] appliedId: LogId [index=69061, term=2] lastSnapshotId: LogId [index=0, term=0] fsmCaller: StateMachine [Idle] ballotBox: lastCommittedIndex: 69061 pendingIndex: 0 pendingMetaQueueSize: 0 snapshotExecutor: lastSnapshotTerm: 0 lastSnapshotIndex: 0 term: 2 savingSnapshot: false loadingSnapshot: false stopped: false replicatorGroup: replicators: [] failureReplicators: []

-- <rhea_mqtt--1/127.0.0.1:7182> 10/17/19 1:48:18 PM ============================================================

-- <rhea_mqtt--1/127.0.0.1:7182> -- Gauges ---------------------------------------------------------------------- jraft-fsm-caller-disruptor.buffer-size value = 16384 jraft-fsm-caller-disruptor.remaining-capacity value = 16384 jraft-log-manager-disruptor.buffer-size value = 16384 jraft-log-manager-disruptor.remaining-capacity value = 16384 jraft-node-impl-disruptor.buffer-size value = 16384 jraft-node-impl-disruptor.remaining-capacity value = 16384 jraft-read-only-service-disruptor.buffer-size value = 16384 jraft-read-only-service-disruptor.remaining-capacity value = 16384 raft-rpc-client-thread-pool.active value = 0 raft-rpc-client-thread-pool.completed value = 0 raft-rpc-client-thread-pool.pool-size value = 0 raft-rpc-client-thread-pool.queued value = 0 raft-utils-closure-thread-pool.active value = 0 raft-utils-closure-thread-pool.completed value = 3871 raft-utils-closure-thread-pool.pool-size value = 4 raft-utils-closure-thread-pool.queued value = 0

-- <rhea_mqtt--1/127.0.0.1:7182> -- Histograms ------------------------------------------------------------------ append-logs-bytes count = 6532 min = 126 max = 72884 mean = 2184.97 stddev = 4062.90 median = 931.00 75% <= 1764.00 95% <= 8645.00 98% <= 11946.00 99% <= 14977.00 99.9% <= 72884.00 append-logs-count count = 6532 min = 1 max = 179 mean = 10.79 stddev = 15.65 median = 6.00 75% <= 9.00 95% <= 42.00 98% <= 65.00 99% <= 81.00 99.9% <= 179.00 fsm-apply-tasks-count count = 6633 min = 1 max = 137 mean = 9.41 stddev = 13.30 median = 6.00 75% <= 9.00 95% <= 34.00 98% <= 55.00 99% <= 70.00 99.9% <= 137.00 handle-append-entries-count count = 55024 min = 0 max = 26 mean = 0.67 stddev = 0.93 median = 1.00 75% <= 1.00 95% <= 1.00 98% <= 2.00 99% <= 3.00 99.9% <= 10.00

-- <rhea_mqtt--1/127.0.0.1:7182> -- Timers ---------------------------------------------------------------------- append-logs count = 6532 mean rate = 25.22 calls/second 1-minute rate = 7.81 calls/second 5-minute rate = 12.97 calls/second 15-minute rate = 6.38 calls/second min = 1.00 milliseconds max = 95.00 milliseconds mean = 4.87 milliseconds stddev = 5.74 milliseconds median = 3.00 milliseconds 75% <= 4.00 milliseconds 95% <= 15.00 milliseconds 98% <= 21.00 milliseconds 99% <= 26.00 milliseconds 99.9% <= 95.00 milliseconds fsm-apply-tasks count = 6633 mean rate = 25.63 calls/second 1-minute rate = 7.99 calls/second 5-minute rate = 13.19 calls/second 15-minute rate = 6.48 calls/second min = 0.00 milliseconds max = 2.00 milliseconds mean = 0.08 milliseconds stddev = 0.27 milliseconds median = 0.00 milliseconds 75% <= 0.00 milliseconds 95% <= 1.00 milliseconds 98% <= 1.00 milliseconds 99% <= 1.00 milliseconds 99.9% <= 2.00 milliseconds fsm-commit count = 6634 mean rate = 25.64 calls/second 1-minute rate = 7.99 calls/second 5-minute rate = 13.19 calls/second 15-minute rate = 6.48 calls/second min = 0.00 milliseconds max = 2.00 milliseconds mean = 0.08 milliseconds stddev = 0.28 milliseconds median = 0.00 milliseconds 75% <= 0.00 milliseconds 95% <= 1.00 milliseconds 98% <= 1.00 milliseconds 99% <= 1.00 milliseconds 99.9% <= 1.00 milliseconds fsm-start-following count = 2 mean rate = 0.01 calls/second 1-minute rate = 0.00 calls/second 5-minute rate = 0.09 calls/second 15-minute rate = 0.15 calls/second min = 11.00 milliseconds max = 14.00 milliseconds mean = 11.40 milliseconds stddev = 1.02 milliseconds median = 11.00 milliseconds 75% <= 11.00 milliseconds 95% <= 14.00 milliseconds 98% <= 14.00 milliseconds 99% <= 14.00 milliseconds 99.9% <= 14.00 milliseconds fsm-stop-following count = 1 mean rate = 0.01 calls/second 1-minute rate = 0.02 calls/second 5-minute rate = 0.13 calls/second 15-minute rate = 0.17 calls/second min = 1.00 milliseconds max = 1.00 milliseconds mean = 1.00 milliseconds stddev = 0.00 milliseconds median = 1.00 milliseconds 75% <= 1.00 milliseconds 95% <= 1.00 milliseconds 98% <= 1.00 milliseconds 99% <= 1.00 milliseconds 99.9% <= 1.00 milliseconds handle-append-entries count = 55024 mean rate = 212.25 calls/second 1-minute rate = 73.47 calls/second 5-minute rate = 111.58 calls/second 15-minute rate = 56.01 calls/second min = 0.00 milliseconds max = 1.00 milliseconds mean = 0.02 milliseconds stddev = 0.15 milliseconds median = 0.00 milliseconds 75% <= 0.00 milliseconds 95% <= 0.00 milliseconds 98% <= 1.00 milliseconds 99% <= 1.00 milliseconds 99.9% <= 1.00 milliseconds save-raft-meta count = 4 mean rate = 0.02 calls/second 1-minute rate = 0.01 calls/second 5-minute rate = 0.18 calls/second 15-minute rate = 0.30 calls/second min = 3.00 milliseconds max = 38.00 milliseconds mean = 8.07 milliseconds stddev = 9.51 milliseconds median = 6.00 milliseconds 75% <= 6.00 milliseconds 95% <= 38.00 milliseconds 98% <= 38.00 milliseconds 99% <= 38.00 milliseconds 99.9% <= 38.00 milliseconds

zhangjun050754 commented 4 years ago

7183 -- rheakv 10/17/19 1:48:23 PM ============================================================

-- rheakv -- Histograms ------------------------------------------------------------------ rhea-st-batch-write_-1 count = 12552 min = 1 max = 85 mean = 5.84 stddev = 8.81 median = 3.00 75% <= 6.00 95% <= 21.00 98% <= 37.00 99% <= 52.00 99.9% <= 85.00 send_batching_get_bytes count = 0 min = 0 max = 0 mean = 0.00 stddev = 0.00 median = 0.00 75% <= 0.00 95% <= 0.00 98% <= 0.00 99% <= 0.00 99.9% <= 0.00 send_batching_get_keys count = 0 min = 0 max = 0 mean = 0.00 stddev = 0.00 median = 0.00 75% <= 0.00 95% <= 0.00 98% <= 0.00 99% <= 0.00 99.9% <= 0.00 send_batching_get_only_safe_bytes count = 0 min = 0 max = 0 mean = 0.00 stddev = 0.00 median = 0.00 75% <= 0.00 95% <= 0.00 98% <= 0.00 99% <= 0.00 99.9% <= 0.00 send_batching_get_only_safe_keys count = 0 min = 0 max = 0 mean = 0.00 stddev = 0.00 median = 0.00 75% <= 0.00 95% <= 0.00 98% <= 0.00 99% <= 0.00 99.9% <= 0.00 send_batching_put_bytes count = 69056 min = 114 max = 12000 mean = 188.98 stddev = 618.45 median = 125.00 75% <= 127.00 95% <= 127.00 98% <= 480.00 99% <= 1270.00 99.9% <= 12000.00 send_batching_put_keys count = 69056 min = 1 max = 78 mean = 1.37 stddev = 3.44 median = 1.00 75% <= 1.00 95% <= 1.00 98% <= 3.00 99% <= 9.00 99.9% <= 49.00

-- rheakv -- Meters ---------------------------------------------------------------------- rhea-st-apply-qps-1 count = 69059 mean rate = 269.91 events/second 1-minute rate = 71.19 events/second 5-minute rate = 131.53 events/second 15-minute rate = 63.93 events/second rhea-st-apply-qps-1PUT count = 63767 mean rate = 249.60 events/second 1-minute rate = 66.27 events/second 5-minute rate = 121.63 events/second 15-minute rate = 58.95 events/second rhea-st-apply-qps-1_PUTLIST count = 5291 mean rate = 25.88 events/second 1-minute rate = 11.44 events/second 5-minute rate = 103.89 events/second 15-minute rate = 151.53 events/second rhea-st-apply-qps-1_SCAN count = 1 mean rate = 0.00 events/second 1-minute rate = 0.00 events/second 5-minute rate = 0.09 events/second 15-minute rate = 0.15 events/second

-- rheakv -- Timers ---------------------------------------------------------------------- rhea-db-timer_BATCH_PUT count = 7013 mean rate = 34.31 calls/second 1-minute rate = 9.41 calls/second 5-minute rate = 37.86 calls/second 15-minute rate = 44.45 calls/second min = 0.00 milliseconds max = 1.71 milliseconds mean = 0.04 milliseconds stddev = 0.05 milliseconds median = 0.03 milliseconds 75% <= 0.04 milliseconds 95% <= 0.08 milliseconds 98% <= 0.11 milliseconds 99% <= 0.13 milliseconds 99.9% <= 0.20 milliseconds rhea-db-timer_PUT count = 1693 mean rate = 6.63 calls/second 1-minute rate = 1.56 calls/second 5-minute rate = 3.23 calls/second 15-minute rate = 1.70 calls/second min = 0.00 milliseconds max = 0.18 milliseconds mean = 0.02 milliseconds stddev = 0.01 milliseconds median = 0.01 milliseconds 75% <= 0.02 milliseconds 95% <= 0.04 milliseconds 98% <= 0.05 milliseconds 99% <= 0.05 milliseconds 99.9% <= 0.16 milliseconds rhea-db-timer_PUT_LIST count = 5291 mean rate = 25.88 calls/second 1-minute rate = 11.44 calls/second 5-minute rate = 103.89 calls/second 15-minute rate = 151.53 calls/second min = 0.00 milliseconds max = 0.40 milliseconds mean = 0.03 milliseconds stddev = 0.03 milliseconds median = 0.02 milliseconds 75% <= 0.03 milliseconds 95% <= 0.09 milliseconds 98% <= 0.11 milliseconds 99% <= 0.14 milliseconds 99.9% <= 0.18 milliseconds rhea-db-timerSCAN count = 1 mean rate = 0.00 calls/second 1-minute rate = 0.00 calls/second 5-minute rate = 0.09 calls/second 15-minute rate = 0.15 calls/second min = 0.15 milliseconds max = 0.15 milliseconds mean = 0.15 milliseconds stddev = 0.00 milliseconds median = 0.15 milliseconds 75% <= 0.15 milliseconds 95% <= 0.15 milliseconds 98% <= 0.15 milliseconds 99% <= 0.15 milliseconds 99.9% <= 0.15 milliseconds rhea-rpc-request-timer-1 count = 0 mean rate = 0.00 calls/second 1-minute rate = 0.00 calls/second 5-minute rate = 0.00 calls/second 15-minute rate = 0.00 calls/second min = 0.00 milliseconds max = 0.00 milliseconds mean = 0.00 milliseconds stddev = 0.00 milliseconds median = 0.00 milliseconds 75% <= 0.00 milliseconds 95% <= 0.00 milliseconds 98% <= 0.00 milliseconds 99% <= 0.00 milliseconds 99.9% <= 0.00 milliseconds

nodeId: <mqttRpcGroup/127.0.0.1:7183> state: STATE_FOLLOWER term: 3 conf: ConfigurationEntry [id=LogId [index=2, term=3], conf=127.0.0.1:7181,127.0.0.1:7182,127.0.0.1:7183, oldConf=] electionTimer: RepeatedTimer [timerTask=com.alipay.sofa.jraft.util.RepeatedTimer$1@31422ebe, stopped=false, running=true, destroyed=false, invoking=false, timeoutMs=1000] voteTimer: RepeatedTimer [timerTask=null, stopped=true, running=false, destroyed=false, invoking=false, timeoutMs=1000] stepDownTimer: RepeatedTimer [timerTask=null, stopped=true, running=false, destroyed=false, invoking=false, timeoutMs=500] snapshotTimer: RepeatedTimer [timerTask=com.alipay.sofa.jraft.util.RepeatedTimer$1@7440a6f2, stopped=false, running=true, destroyed=false, invoking=false, timeoutMs=3600000] logManager: storage: [1, 81] diskId: LogId [index=81, term=3] appliedId: LogId [index=81, term=3] lastSnapshotId: LogId [index=0, term=0] fsmCaller: StateMachine [Idle] ballotBox: lastCommittedIndex: 81 pendingIndex: 0 pendingMetaQueueSize: 0 snapshotExecutor: lastSnapshotTerm: 0 lastSnapshotIndex: 0 term: 3 savingSnapshot: false loadingSnapshot: false stopped: false replicatorGroup: replicators: [] failureReplicators: [] nodeId: <rhea_mqtt--1/127.0.0.1:7183> state: STATE_LEADER term: 2 conf: ConfigurationEntry [id=LogId [index=69061, term=2], conf=127.0.0.1:7181,127.0.0.1:7182,127.0.0.1:7183, oldConf=] electionTimer: RepeatedTimer [timerTask=null, stopped=true, running=false, destroyed=false, invoking=false, timeoutMs=1000] voteTimer: RepeatedTimer [timerTask=null, stopped=true, running=false, destroyed=false, invoking=false, timeoutMs=1000] stepDownTimer: RepeatedTimer [timerTask=com.alipay.sofa.jraft.util.RepeatedTimer$1@1dbd0981, stopped=false, running=true, destroyed=false, invoking=false, timeoutMs=500] snapshotTimer: RepeatedTimer [timerTask=com.alipay.sofa.jraft.util.RepeatedTimer$1@457ef51b, stopped=false, running=true, destroyed=false, invoking=false, timeoutMs=3600000] logManager: storage: [1, 69061] diskId: LogId [index=69061, term=2] appliedId: LogId [index=69061, term=2] lastSnapshotId: LogId [index=0, term=0] fsmCaller: StateMachine [Idle] ballotBox: lastCommittedIndex: 69061 pendingIndex: 69062 pendingMetaQueueSize: 0 snapshotExecutor: lastSnapshotTerm: 0 lastSnapshotIndex: 0 term: 1 savingSnapshot: false loadingSnapshot: false stopped: false replicatorGroup: replicators: [Replicator [state=Probe, statInfo=,peerId=127.0.0.1:7181], Replicator [state=Replicate, statInfo=,peerId=127.0.0.1:7182]] failureReplicators: []

-- <rhea_mqtt--1/127.0.0.1:7183> 10/17/19 1:48:23 PM ============================================================

-- <rhea_mqtt--1/127.0.0.1:7183> -- Gauges ---------------------------------------------------------------------- jraft-fsm-caller-disruptor.buffer-size value = 16384 jraft-fsm-caller-disruptor.remaining-capacity value = 16384 jraft-log-manager-disruptor.buffer-size value = 16384 jraft-log-manager-disruptor.remaining-capacity value = 16384 jraft-node-impl-disruptor.buffer-size value = 16384 jraft-node-impl-disruptor.remaining-capacity value = 16384 jraft-read-only-service-disruptor.buffer-size value = 16384 jraft-read-only-service-disruptor.remaining-capacity value = 16384 raft-rpc-client-thread-pool.active value = 0 raft-rpc-client-thread-pool.completed value = 4 raft-rpc-client-thread-pool.pool-size value = 4 raft-rpc-client-thread-pool.queued value = 0 raft-utils-closure-thread-pool.active value = 0 raft-utils-closure-thread-pool.completed value = 1546 raft-utils-closure-thread-pool.pool-size value = 4 raft-utils-closure-thread-pool.queued value = 0 replicator-rhea_mqtt--1/127.0.0.1:7181.append-entries-times value = 1 replicator-rhea_mqtt--1/127.0.0.1:7181.heartbeat-times value = 260 replicator-rhea_mqtt--1/127.0.0.1:7181.install-snapshot-times value = 0 replicator-rhea_mqtt--1/127.0.0.1:7181.log-lags value = 0 replicator-rhea_mqtt--1/127.0.0.1:7181.next-index value = 69062 replicator-rhea_mqtt--1/127.0.0.1:7182.append-entries-times value = 1 replicator-rhea_mqtt--1/127.0.0.1:7182.heartbeat-times value = 1306 replicator-rhea_mqtt--1/127.0.0.1:7182.install-snapshot-times value = 0 replicator-rhea_mqtt--1/127.0.0.1:7182.log-lags value = 0 replicator-rhea_mqtt--1/127.0.0.1:7182.next-index value = 69062

-- <rhea_mqtt--1/127.0.0.1:7183> -- Histograms ------------------------------------------------------------------ append-logs-bytes count = 6568 min = 120 max = 71208 mean = 2348.71 stddev = 4579.91 median = 992.00 75% <= 1728.00 95% <= 8316.00 98% <= 15528.00 99% <= 19833.00 99.9% <= 71208.00 append-logs-count count = 6568 min = 1 max = 139 mean = 10.62 stddev = 14.78 median = 6.00 75% <= 9.00 95% <= 42.00 98% <= 64.00 99% <= 78.00 99.9% <= 139.00 fsm-apply-tasks-count count = 6763 min = 1 max = 250 mean = 9.51 stddev = 14.98 median = 6.00 75% <= 9.00 95% <= 35.00 98% <= 50.00 99% <= 64.00 99.9% <= 250.00 handle-append-entries-count count = 53801 min = 0 max = 42 mean = 1.32 stddev = 2.41 median = 1.00 75% <= 1.00 95% <= 2.00 98% <= 4.00 99% <= 8.00 99.9% <= 42.00 handle-read-index-entries count = 10 min = 1 max = 1 mean = 1.00 stddev = 0.00 median = 1.00 75% <= 1.00 95% <= 1.00 98% <= 1.00 99% <= 1.00 99.9% <= 1.00 replicate-entries-bytes count = 2 min = 0 max = 0 mean = 0.00 stddev = 0.00 median = 0.00 75% <= 0.00 95% <= 0.00 98% <= 0.00 99% <= 0.00 99.9% <= 0.00 replicate-entries-count count = 2 min = 1 max = 1 mean = 1.00 stddev = 0.00 median = 1.00 75% <= 1.00 95% <= 1.00 98% <= 1.00 99% <= 1.00 99.9% <= 1.00 replicate-inflights-count count = 4 min = 1 max = 1 mean = 1.00 stddev = 0.00 median = 1.00 75% <= 1.00 95% <= 1.00 98% <= 1.00 99% <= 1.00 99.9% <= 1.00

-- <rhea_mqtt--1/127.0.0.1:7183> -- Timers ---------------------------------------------------------------------- append-logs count = 6568 mean rate = 25.70 calls/second 1-minute rate = 6.73 calls/second 5-minute rate = 12.56 calls/second 15-minute rate = 6.20 calls/second min = 1.00 milliseconds max = 142.00 milliseconds mean = 4.65 milliseconds stddev = 6.47 milliseconds median = 3.00 milliseconds 75% <= 4.00 milliseconds 95% <= 14.00 milliseconds 98% <= 23.00 milliseconds 99% <= 31.00 milliseconds 99.9% <= 50.00 milliseconds fsm-apply-tasks count = 6763 mean rate = 26.47 calls/second 1-minute rate = 7.01 calls/second 5-minute rate = 12.97 calls/second 15-minute rate = 6.38 calls/second min = 0.00 milliseconds max = 5.00 milliseconds mean = 0.09 milliseconds stddev = 0.31 milliseconds median = 0.00 milliseconds 75% <= 0.00 milliseconds 95% <= 1.00 milliseconds 98% <= 1.00 milliseconds 99% <= 1.00 milliseconds 99.9% <= 2.00 milliseconds fsm-commit count = 6764 mean rate = 26.47 calls/second 1-minute rate = 7.01 calls/second 5-minute rate = 12.97 calls/second 15-minute rate = 6.39 calls/second min = 0.00 milliseconds max = 3.00 milliseconds mean = 0.11 milliseconds stddev = 0.32 milliseconds median = 0.00 milliseconds 75% <= 0.00 milliseconds 95% <= 1.00 milliseconds 98% <= 1.00 milliseconds 99% <= 1.00 milliseconds 99.9% <= 2.00 milliseconds fsm-start-following count = 1 mean rate = 0.00 calls/second 1-minute rate = 0.00 calls/second 5-minute rate = 0.09 calls/second 15-minute rate = 0.15 calls/second min = 6.00 milliseconds max = 6.00 milliseconds mean = 6.00 milliseconds stddev = 0.00 milliseconds median = 6.00 milliseconds 75% <= 6.00 milliseconds 95% <= 6.00 milliseconds 98% <= 6.00 milliseconds 99% <= 6.00 milliseconds 99.9% <= 6.00 milliseconds fsm-stop-following count = 1 mean rate = 0.01 calls/second 1-minute rate = 0.02 calls/second 5-minute rate = 0.13 calls/second 15-minute rate = 0.17 calls/second min = 1.00 milliseconds max = 1.00 milliseconds mean = 1.00 milliseconds stddev = 0.00 milliseconds median = 1.00 milliseconds 75% <= 1.00 milliseconds 95% <= 1.00 milliseconds 98% <= 1.00 milliseconds 99% <= 1.00 milliseconds 99.9% <= 1.00 milliseconds handle-append-entries count = 53801 mean rate = 210.50 calls/second 1-minute rate = 55.87 calls/second 5-minute rate = 106.39 calls/second 15-minute rate = 56.52 calls/second min = 0.00 milliseconds max = 5.00 milliseconds mean = 0.01 milliseconds stddev = 0.10 milliseconds median = 0.00 milliseconds 75% <= 0.00 milliseconds 95% <= 0.00 milliseconds 98% <= 0.00 milliseconds 99% <= 0.00 milliseconds 99.9% <= 1.00 milliseconds handle-read-index count = 10 mean rate = 0.07 calls/second 1-minute rate = 0.12 calls/second 5-minute rate = 0.65 calls/second 15-minute rate = 0.87 calls/second min = 0.00 milliseconds max = 3.00 milliseconds mean = 0.56 milliseconds stddev = 0.88 milliseconds median = 0.00 milliseconds 75% <= 1.00 milliseconds 95% <= 3.00 milliseconds 98% <= 3.00 milliseconds 99% <= 3.00 milliseconds 99.9% <= 3.00 milliseconds pre-vote count = 2 mean rate = 0.01 calls/second 1-minute rate = 0.04 calls/second 5-minute rate = 0.26 calls/second 15-minute rate = 0.34 calls/second min = 5.00 milliseconds max = 10.00 milliseconds mean = 7.50 milliseconds stddev = 2.50 milliseconds median = 10.00 milliseconds 75% <= 10.00 milliseconds 95% <= 10.00 milliseconds 98% <= 10.00 milliseconds 99% <= 10.00 milliseconds 99.9% <= 10.00 milliseconds replicate-entries count = 2 mean rate = 0.01 calls/second 1-minute rate = 0.04 calls/second 5-minute rate = 0.26 calls/second 15-minute rate = 0.34 calls/second min = 6.00 milliseconds max = 26.00 milliseconds mean = 16.00 milliseconds stddev = 10.00 milliseconds median = 26.00 milliseconds 75% <= 26.00 milliseconds 95% <= 26.00 milliseconds 98% <= 26.00 milliseconds 99% <= 26.00 milliseconds 99.9% <= 26.00 milliseconds request-vote count = 2 mean rate = 0.01 calls/second 1-minute rate = 0.04 calls/second 5-minute rate = 0.26 calls/second 15-minute rate = 0.34 calls/second min = 27.00 milliseconds max = 51.00 milliseconds mean = 39.00 milliseconds stddev = 12.00 milliseconds median = 51.00 milliseconds 75% <= 51.00 milliseconds 95% <= 51.00 milliseconds 98% <= 51.00 milliseconds 99% <= 51.00 milliseconds 99.9% <= 51.00 milliseconds save-raft-meta count = 2 mean rate = 0.01 calls/second 1-minute rate = 0.00 calls/second 5-minute rate = 0.09 calls/second 15-minute rate = 0.15 calls/second min = 7.00 milliseconds max = 17.00 milliseconds mean = 8.53 milliseconds stddev = 3.60 milliseconds median = 7.00 milliseconds 75% <= 7.00 milliseconds 95% <= 17.00 milliseconds 98% <= 17.00 milliseconds 99% <= 17.00 milliseconds 99.9% <= 17.00 milliseconds

zhangjun050754 commented 4 years ago

deadNodes=127.0.0.1:7182,127.0.0.1:7183

这两个节点进程是否还在? 在的话 'kill -s SIGUSR2 pid' 看一下并把 jraft 生成的状态信息文件发一下看看 (文件名以 node_describe.log 和 node_metrics.log 以及 rheakv_metrics.log 为前缀,每个进程三个文件)

端口进程还在 不过端口服务应该挂了

fengjiachun commented 4 years ago

nodeId: <mqttRpcGroup/127.0.0.1:7181> nodeId: <rhea_mqtt--1/127.0.0.1:7181>

为什么有两个 raft group? 你每个节点起了两个 jraft 服务?

zhangjun050754 commented 4 years ago

nodeId: <mqttRpcGroup/127.0.0.1:7181> nodeId: <rhea_mqtt--1/127.0.0.1:7181>

为什么有两个 raft group? 你每个节点起了两个 jraft 服务?

是的 起了一个我自己业务用的 感觉不相干 之前也以为是这个问题 不起也还是报这个错 我去掉再测一下 看下

fengjiachun commented 4 years ago

@zhangjun050754 端口冲突了吧?

zhangjun050754 commented 4 years ago

@zhangjun050754 端口冲突了吧?

去掉了 没用 依然挂掉 数据量小没问题 数据量一大查询几次就挂掉了

public CompletableFuture<List<KVEntry>>  scan(String startKey , String endKey , boolean readOnlySafe, boolean returnValue) {

    CompletableFuture<List<KVEntry>> scan = rheaKVStore.scan(startKey, endKey, readOnlySafe, returnValue);

    return scan;
}
killme2008 commented 4 years ago

start 和 end 范围很大? 那肯定是内存爆了, Full gc 导致失去响应,心跳超时当然 step down。

建议你观察下 gc, jstat 看看。

killme2008 commented 4 years ago

大范围的数据 scan,建议使用 iterator

RheaIterator<KVEntry> iterator(final byte[] startKey, final byte[] endKey, final int bufSize);

它是流式查询的方式,而 scan 是会将所有 key/value 都查询回来才返回。

fengjiachun commented 4 years ago
rhea-db-timer_SCAN
count = 155
mean rate = 0.39 calls/second
1-minute rate = 0.03 calls/second
5-minute rate = 0.23 calls/second
15-minute rate = 0.25 calls/second
min = 0.15 milliseconds
max = 245185.71 milliseconds
mean = 105981.36 milliseconds
stddev = 121408.78 milliseconds
median = 92.55 milliseconds
75% <= 245185.71 milliseconds
95% <= 245185.71 milliseconds
98% <= 245185.71 milliseconds
99% <= 245185.71 milliseconds
99.9% <= 245185.71 milliseconds

scan 了很大范围吧? 看下 gc 吧,大范围查询可以改为 iterator

zhangjun050754 commented 4 years ago
rhea-db-timer_SCAN
count = 155
mean rate = 0.39 calls/second
1-minute rate = 0.03 calls/second
5-minute rate = 0.23 calls/second
15-minute rate = 0.25 calls/second
min = 0.15 milliseconds
max = 245185.71 milliseconds
mean = 105981.36 milliseconds
stddev = 121408.78 milliseconds
median = 92.55 milliseconds
75% <= 245185.71 milliseconds
95% <= 245185.71 milliseconds
98% <= 245185.71 milliseconds
99% <= 245185.71 milliseconds
99.9% <= 245185.71 milliseconds

scan 了很大范围吧? 看下 gc 吧,大范围查询可以改为 iterator

看不出来有问题啊 我先试一下iterator

image

fengjiachun commented 4 years ago

95% <= 245185.71 milliseconds

你的 scan p95 都 245 秒了,看下 gc 啊,另外 scan 多大范围也说一下

zhangjun050754 commented 4 years ago

95% <= 245185.71 milliseconds

你的 scan p95 都 245 秒了,看下 gc 啊,另外 scan 多大范围也说一下

每次查询5万的数据量 scan 写成hashkey的模式匹配的 public String getKey(String key) { String keyFormat = cachePrefix+recordPrefixSep+"%s"; return String.format(keyFormat, key);
}

scan查询所有cachePrefix 开头的数据,测试的时候插入了5万的数据量

image

fengjiachun commented 4 years ago

不到700m的heap你不能这么玩,你得看下gc log

fengjiachun commented 4 years ago

没其他问题先关闭了