sofastack / sofa-jraft

A production-grade java implementation of RAFT consensus algorithm.
https://www.sofastack.tech/projects/sofa-jraft/
Apache License 2.0
3.56k stars 1.14k forks source link

Log Manage Busy #1023

Closed googlefan closed 1 year ago

googlefan commented 1 year ago

Your question

1021

在执行Jraft 集群压测时,OOM问题通过缩小核心线程数得到了解决, 磁盘的写性能应该还有富余, 但是压测还是出现了 接口报错,报错后,就会产生 Leader step down 的问题,我想规避该问题,日志如下:

2023-08-25 09:58:05.428 [JRaft-FSMCaller-Disruptor-0] [] INFO  c.y.h.p.fs.storage.jraft.StorageStateMachine - On apply with term: 3 and index: 1478.
2023-08-25 09:58:05.546 [group1/PeerPair[192.168.165.32:8261 -> 192.168.165.30:8261]-AppendEntriesThread0] [] WARN  com.alipay.sofa.jraft.core.NodeImpl - Node <group1/192.168.165.32:8261> received AppendEntriesRequest but log manager is busy.
2023-08-25 09:58:05.552 [JRaft-FSMCaller-Disruptor-0] [] INFO  c.y.h.p.fs.storage.jraft.StorageStateMachine - On apply with term: 3 and index: 1479.
2023-08-25 09:58:05.622 [group1/PeerPair[192.168.165.32:8261 -> 192.168.165.30:8261]-AppendEntriesThread0] [] WARN  com.alipay.sofa.jraft.core.NodeImpl - Node <group1/192.168.165.32:8261> reject term_unmatched AppendEntriesRequest from 192.168.165.30:8261, term=3, prevLogIndex=1493, prevL
2023-08-25 09:58:11.309 [http-nio-8062-exec-8] [] DEBUG com.yss.henghe.platform.fs.client.FsClient - invoke err: Leader stepped down
2023-08-25 09:58:11.309 [http-nio-8062-exec-9] [] DEBUG com.yss.henghe.platform.fs.client.FsClient - invoke err: Leader stepped down
2023-08-25 09:58:11.309 [http-nio-8062-exec-7] [] DEBUG com.yss.henghe.platform.fs.client.FsClient - invoke err: Leader stepped down
2023-08-25 09:58:11.327 [group1/PeerPair[192.168.165.32:8261 -> 192.168.165.30:8261]-AppendEntriesThread0] [] WARN  com.alipay.sofa.jraft.core.NodeImpl - Node <group1/192.168.165.32:8261> reject term_unmatched AppendEntriesRequest from 192.168.165.30:8261, term=3, prevLogIndex=1534, prevLogTerm=3, localPrevLogTerm=0, lastLogIndex=1492, entriesSize=1.
2023-08-25 09:58:11.527 [http-nio-8062-exec-10] [] ERROR c.y.h.platform.fs.exceptions.ApiExceptionHandler - 上传文件失败
com.yss.henghe.platform.fs.FsException: Leader stepped down
        at com.yss.henghe.platform.fs.client.FsClient.invoke(FsClient.java:377)
        at com.yss.henghe.platform.fs.client.FsClient.putObject(FsClient.java:240)
        at com.yss.henghe.platform.fs.service.impl.ObjectServiceImpl.putObject(ObjectServiceImpl.java:46)
        at com.yss.henghe.platform.fs.s3.ObjectController.upLoadByFile(ObjectController.java:54)
        at com.yss.henghe.platform.fs.s3.ObjectController$$FastClassBySpringCGLIB$$66c46a2a.invoke(<generated>)
ogTerm=3, localPrevLogTerm=0, lastLogIndex=1492, entriesSize=1.

我的问题是如何提升 log manager 的处理能力呢? 可以通过增加 log manager disruptor 线程数 ? 或是处理log 的缓存大小改善么?

Your scenes

通过jvm 线程观测, log manager disruptor 线程池 只有一个线程,运行时占比也相对较高.

截屏2023-08-25 10 13 15

磁盘写 数据如下:

截屏2023-08-25 10 08 23

Environment

googlefan commented 1 year ago

通过设置 disruptor-buffer-size 参数可以改善这个问题.