sofastack / sofa-jraft

A production-grade java implementation of RAFT consensus algorithm.
https://www.sofastack.tech/projects/sofa-jraft/
Apache License 2.0
3.52k stars 1.12k forks source link

节点重启有机率出现No locks available,不知道这正常吗?? #1089

Closed zxuanhong closed 3 months ago

zxuanhong commented 3 months ago

Your question

  1. 整个jraft节点重启,最后几个有机率出现No locks available。请教下这是怎么回事了,这正常吗??这个问题是否需要用户自行处理,或者修复rocksdb的数据,还是说jraft会自行处理修复。
  2. 具体日志如下
    
    2024-03-13 10:00:53  INFO 42859 --- [flow-demo] [           main] c.j.k.cluster.clusterworker.KpWorker     : [KpWorker]log uri: data/jraft-cluster/0/log, raft meta uri: data/jraft-cluster/0/meta, snapshot uri: data/jraft-cluster/0/snapshot.
    2024-03-13 10:00:53  INFO 42859 --- [flow-demo] [ler-Disruptor-0] c.j.k.cluster.clusterworker.KpWorker     : [KpWorker]log uri: data/jraft-cluster/0/log, raft meta uri: data/jraft-cluster/0/meta, snapshot uri: data/jraft-cluster/0/snapshot.
    2024-03-13 10:00:53  INFO 42859 --- [flow-demo] [ler-Disruptor-0] com.alipay.sofa.jraft.core.NodeImpl      : The number of active nodes increment to 4.
    2024-03-13 10:00:53  INFO 42859 --- [flow-demo] [           main] com.alipay.sofa.jraft.core.NodeImpl      : The number of active nodes increment to 3.
    2024-03-13 10:00:53  INFO 42859 --- [flow-demo] [dEntriesThread0] com.alipay.sofa.jraft.util.Recyclers     : -Djraft.recyclers.maxCapacityPerThread: 4096.
    2024-03-13 10:00:53  INFO 42859 --- [flow-demo] [-worker-ELG-3-4] c.j.k.cluster.common.NamedThreadFactory  : Creates new Thread[#156,rheakv-raft-rpc-executor #14,5,main].
    2024-03-13 10:00:53  INFO 42859 --- [flow-demo] [-worker-ELG-3-5] c.j.k.cluster.common.NamedThreadFactory  : Creates new Thread[#157,rheakv-raft-rpc-executor #15,5,main].
    2024-03-13 10:00:53 ERROR 42859 --- [flow-demo] [ler-Disruptor-0] c.a.s.j.storage.impl.RocksDBLogStorage   : Fail to init RocksDBLogStorage, path=data/jraft-cluster/0/log.

org.rocksdb.RocksDBException: lock hold by current process, acquire time 1710295253 acquiring thread 123145571643392: data/jraft-cluster/0/log/LOCK: No locks available at org.rocksdb.RocksDB.open(Native Method) ~[rocksdbjni-8.8.1.jar:na] at org.rocksdb.RocksDB.open(RocksDB.java:312) ~[rocksdbjni-8.8.1.jar:na] at com.alipay.sofa.jraft.storage.impl.RocksDBLogStorage.openDB(RocksDBLogStorage.java:313) ~[jraft-core-1.3.14.jar:na] at com.alipay.sofa.jraft.storage.impl.RocksDBLogStorage.initAndLoad(RocksDBLogStorage.java:231) ~[jraft-core-1.3.14.jar:na] at com.alipay.sofa.jraft.storage.impl.RocksDBLogStorage.init(RocksDBLogStorage.java:210) ~[jraft-core-1.3.14.jar:na] at com.alipay.sofa.jraft.storage.impl.RocksDBLogStorage.init(RocksDBLogStorage.java:64) ~[jraft-core-1.3.14.jar:na] at com.alipay.sofa.jraft.storage.impl.LogManagerImpl.init(LogManagerImpl.java:184) ~[jraft-core-1.3.14.jar:na] at com.alipay.sofa.jraft.storage.impl.LogManagerImpl.init(LogManagerImpl.java:76) ~[jraft-core-1.3.14.jar:na] at com.alipay.sofa.jraft.core.NodeImpl.initLogStorage(NodeImpl.java:590) ~[jraft-core-1.3.14.jar:na] at com.alipay.sofa.jraft.core.NodeImpl.init(NodeImpl.java:1004) ~[jraft-core-1.3.14.jar:na] at com.alipay.sofa.jraft.core.NodeImpl.init(NodeImpl.java:141) ~[jraft-core-1.3.14.jar:na] at com.alipay.sofa.jraft.RaftServiceFactory.createAndInitRaftNode(RaftServiceFactory.java:47) ~[jraft-core-1.3.14.jar:na] at com.alipay.sofa.jraft.RaftGroupService.start(RaftGroupService.java:129) ~[jraft-core-1.3.14.jar:na] at com.jraft.kpcluster.cluster.clusterworker.KpWorker.init(KpWorker.java:71) ~[main/:na] at com.jraft.kpcluster.cluster.clusterworker.KpWorkerGroup.init(KpWorkerGroup.java:67) ~[main/:na] at com.jraft.kpcluster.cluster.clusterworker.HandleKpWorker.leaderChange(HandleKpWorker.java:121) ~[main/:na] at com.jraft.kpcluster.cluster.clusterworker.HandleKpWorker.followChange(HandleKpWorker.java:177) ~[main/:na] at com.jraft.kpcluster.cluster.clustermanage.KpManageStateMachine.initKpWorkerFollow(KpManageStateMachine.java:177) ~[main/:na] at com.jraft.kpcluster.cluster.clustermanage.KpManageStateMachine.onStartFollowing(KpManageStateMachine.java:285) ~[main/:na] at com.alipay.sofa.jraft.core.FSMCallerImpl.doStartFollowing(FSMCallerImpl.java:746) ~[jraft-core-1.3.14.jar:na] at com.alipay.sofa.jraft.core.FSMCallerImpl.runApplyTask(FSMCallerImpl.java:436) ~[jraft-core-1.3.14.jar:na] at com.alipay.sofa.jraft.core.FSMCallerImpl.access$100(FSMCallerImpl.java:73) ~[jraft-core-1.3.14.jar:na] at com.alipay.sofa.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:150) ~[jraft-core-1.3.14.jar:na] at com.alipay.sofa.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:142) ~[jraft-core-1.3.14.jar:na] at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:137) ~[disruptor-3.3.7.jar:na] at java.base/java.lang.Thread.run(Thread.java:1583) ~[na:na]



### Environment

- SOFAJRaft version:
- JVM version (e.g. `java -version`):
- OS version (e.g. `uname -a`):
- Maven version:
- IDE version:
fengjiachun commented 3 months ago

org.rocksdb.RocksDBException: lock hold by current process, acquire time 1710295253 acquiring thread 123145571643392: data/jraft-cluster/0/log/LOCK: No locks available

RocksDB 被多个进程打开了?

zxuanhong commented 3 months ago

@fengjiachun 可能是,我再检查下。或者给出非多个线程打开还能出现的复现步骤。如果不能复现我将关闭。

zxuanhong commented 3 months ago

@fengjiachun 目前是因为被多个线程调用了。管理节点下发创建region任务时,onApply多次了。 非常感谢