sofastack / sofa-jraft

A production-grade java implementation of RAFT consensus algorithm.
https://www.sofastack.tech/projects/sofa-jraft/
Apache License 2.0
3.57k stars 1.14k forks source link

Got STATE_ERROR while starting native image with sofa-jraft #1138

Closed DioxideCN closed 1 month ago

DioxideCN commented 1 month ago

Your question

So far, sofa-jraft does not support GraalVM Native Image. Therefore, I need to collect a lot of metadata in the project to ensure that the project can support Native Image. Therefore, the problem occurred in the Native Image startup phase after the metadata was collected. When starting RaftGroupService, an Illegal state: STATE_ERROR problem was returned. The complete stack trace information is as follows:

Caused by: java.lang.IllegalArgumentException: Illegal state: STATE_ERROR
    at com.alipay.sofa.jraft.util.Requires.requireTrue(Requires.java:85)
    at com.alipay.sofa.jraft.core.NodeImpl.becomeLeader(NodeImpl.java:1230)
    at com.alipay.sofa.jraft.core.NodeImpl.electSelf(NodeImpl.java:1189)
    at com.alipay.sofa.jraft.core.NodeImpl.init(NodeImpl.java:1115)
    at com.alipay.sofa.jraft.core.NodeImpl.init(NodeImpl.java:141)
    at com.alipay.sofa.jraft.RaftServiceFactory.createAndInitRaftNode(RaftServiceFactory.java:47)
    at com.alipay.sofa.jraft.RaftGroupService.start(RaftGroupService.java:129)
    at com.alibaba.nacos.core.distributed.raft.JRaftServer.createMultiRaftGroup(JRaftServer.java:260)
    at com.alibaba.nacos.core.distributed.raft.JRaftProtocol.addRequestProcessors(JRaftProtocol.java:163)
    at com.alibaba.nacos.naming.core.v2.service.impl.PersistentClientOperationServiceImpl.<init>(PersistentClientOperationServiceImpl.java:101)
    at com.alibaba.nacos.naming.core.v2.service.impl.PersistentClientOperationServiceImpl__BeanDefinitions.lambda$getPersistentClientOperationServiceImplInstanceSupplier$0(PersistentClientOperationServiceImpl__BeanDefinitions.java:20)

I tried many ways to locate why this STATE_ERROR is generated. I focused on the NodeImpl.onError method and tried to find clues that might cause this error in NodeImpl.electSelf and NodeImpl.becomeLeader, but unfortunately all failed. I can only confirm that the onError method has been called in the three classes of FSMCallerImpl, Replicator, and LocalRaftMetaStorage. I need more information about why the electSelf and becomeLeader methods generate the STATE_ERROR error.

I can provide the following metadata information related to sofa-jraft collected when Nacos supports GraalVM:

Stream.of(
                // jraft
                com.alipay.sofa.jraft.entity.LocalFileMetaOutter.LocalFileMeta.class,
                com.alipay.sofa.jraft.entity.LocalStorageOutter.ConfigurationPBMeta.class,
                com.alipay.sofa.jraft.entity.LocalStorageOutter.LogPBMeta.class,
                com.alipay.sofa.jraft.entity.LocalStorageOutter.StablePBMeta.class,
                com.alipay.sofa.jraft.entity.LocalStorageOutter.LocalSnapshotPbMeta.class,
                com.alipay.sofa.jraft.entity.RaftOutter.EntryMeta.class,
                com.alipay.sofa.jraft.entity.RaftOutter.SnapshotMeta.class,
                com.alipay.sofa.jraft.entity.codec.v2.LogOutter.PBLogEntry.class,
                com.alipay.sofa.jraft.rpc.RpcRequests.AppendEntriesRequest.class,
                com.alipay.sofa.jraft.rpc.RpcRequests.AppendEntriesResponse.class,
                com.alipay.sofa.jraft.rpc.RpcRequests.AppendEntriesRequestHeader.class,
                com.alipay.sofa.jraft.rpc.RpcRequests.PingRequest.class,
                com.alipay.sofa.jraft.rpc.RpcRequests.ErrorResponse.class,
                com.alipay.sofa.jraft.rpc.RpcRequests.InstallSnapshotRequest.class,
                com.alipay.sofa.jraft.rpc.RpcRequests.InstallSnapshotResponse.class,
                com.alipay.sofa.jraft.rpc.RpcRequests.TimeoutNowRequest.class,
                com.alipay.sofa.jraft.rpc.RpcRequests.TimeoutNowResponse.class,
                com.alipay.sofa.jraft.rpc.RpcRequests.RequestVoteRequest.class,
                com.alipay.sofa.jraft.rpc.RpcRequests.RequestVoteResponse.class,
                com.alipay.sofa.jraft.rpc.RpcRequests.GetFileRequest.class,
                com.alipay.sofa.jraft.rpc.RpcRequests.GetFileResponse.class,
                com.alipay.sofa.jraft.rpc.RpcRequests.ReadIndexRequest.class,
                com.alipay.sofa.jraft.rpc.RpcRequests.ReadIndexResponse.class,
                com.alipay.sofa.jraft.rpc.CliRequests.LearnersOpResponse.class,
                com.alipay.sofa.jraft.rpc.CliRequests.ResetLearnersRequest.class,
                com.alipay.sofa.jraft.rpc.CliRequests.RemoveLearnersRequest.class,
                com.alipay.sofa.jraft.rpc.CliRequests.AddLearnersRequest.class,
                com.alipay.sofa.jraft.rpc.CliRequests.GetPeersResponse.class,
                com.alipay.sofa.jraft.rpc.CliRequests.GetPeersRequest.class,
                com.alipay.sofa.jraft.rpc.CliRequests.GetLeaderResponse.class,
                com.alipay.sofa.jraft.rpc.CliRequests.GetLeaderRequest.class,
                com.alipay.sofa.jraft.rpc.CliRequests.TransferLeaderRequest.class,
                com.alipay.sofa.jraft.rpc.CliRequests.ResetPeerRequest.class,
                com.alipay.sofa.jraft.rpc.CliRequests.SnapshotRequest.class,
                com.alipay.sofa.jraft.rpc.CliRequests.ChangePeersResponse.class,
                com.alipay.sofa.jraft.rpc.CliRequests.ChangePeersRequest.class,
                com.alipay.sofa.jraft.rpc.CliRequests.RemovePeerResponse.class,
                com.alipay.sofa.jraft.rpc.CliRequests.RemovePeerRequest.class,
                com.alipay.sofa.jraft.rpc.CliRequests.AddPeerResponse.class,
                com.alipay.sofa.jraft.rpc.CliRequests.AddPeerRequest.class,
                com.alipay.sofa.jraft.rpc.ProtobufMsgFactory.class,
                com.alipay.sofa.jraft.rpc.RpcRequestClosure.class,
                com.alipay.sofa.jraft.rpc.impl.AbstractClientService.class,
                com.alipay.sofa.jraft.rpc.impl.BoltRaftRpcFactory.class,
                com.alipay.sofa.jraft.rpc.impl.GrpcRaftRpcFactory.class,
                com.alipay.sofa.jraft.util.concurrent.MpscSingleThreadExecutor.class,
                com.alipay.sofa.jraft.util.timer.DefaultRaftTimerFactory.class,
                com.alipay.sofa.jraft.core.DefaultJRaftServiceFactory.class)
        .forEach(type -> hints.reflection()
                .registerType(type,
                        MemberCategory.INVOKE_DECLARED_CONSTRUCTORS,
                        MemberCategory.INVOKE_DECLARED_METHODS,
                        MemberCategory.DECLARED_FIELDS));

Your scenes

Nacos Console and Nacos Client need to support the startup method of GraalVM Native Image.

Environment

DioxideCN commented 1 month ago

Resolved.