sofastack / sofa-jraft

A production-grade java implementation of RAFT consensus algorithm.
https://www.sofastack.tech/projects/sofa-jraft/
Apache License 2.0
3.57k stars 1.14k forks source link

RHEAKV客户端初始化时发生 Group %s is not registered in RouteTable, forgot to call updateConfiguration? #397

Closed juaby closed 4 years ago

juaby commented 4 years ago

Your question

pd fake=false com.alipay.sofa.jraft.RouteTable#groupConfTable

这里的配置仅有PD组的信息,KV分区群组配置不存在,然后就异常 ···java final Configuration conf = getConfiguration(groupId); if (conf == null) { return new Status(RaftError.ENOENT, "Group %s is not registered in RouteTable, forgot to call updateConfiguration?", groupId); }


### Your scenes

KV分区群组配置何时保存到这个map的?

### Your advice

Describe the advice or solution you'd like

### Environment

- SOFAJRaft version:1.3.0
- JVM version (e.g. `java -version`):1.8.94
- OS version (e.g. `uname -a`):win10
- Maven version:
- IDE version:
juaby commented 4 years ago

com.alipay.sofa.jraft.rhea.client.pd.RemotePlacementDriverClient#refreshRouteTable

    @Override
    protected void refreshRouteTable() {
        final Cluster cluster = this.metadataRpcClient.getClusterInfo(this.clusterId);
        if (cluster == null) {
            LOG.warn("Cluster info is empty: {}.", this.clusterId);
            return;
        }
        final List<Store> stores = cluster.getStores();
        if (stores == null || stores.isEmpty()) {
            LOG.error("Stores info is empty: {}.", this.clusterId);
            return;
        }
        for (final Store store : stores) {
            final List<Region> regions = store.getRegions();
            if (regions == null || regions.isEmpty()) {
                LOG.error("Regions info is empty: {} - {}.", this.clusterId, store.getId());
                continue;
            }
            for (final Region region : regions) {
                super.regionRouteTable.addOrUpdateRegion(region);

                //new code start
                final String raftGroupId = JRaftHelper.getJRaftGroupId(this.clusterName, region.getId());
                String serverList = region.getPeers().stream().map(Peer::getEndpoint).map(Endpoint::toString).collect(Collectors.joining(","));
                RouteTable.getInstance().updateConfiguration(raftGroupId, serverList);
                 //new code end
            }
        }
    }
fengjiachun commented 4 years ago

看不懂你的描述,请给出复现流程,或是复现代码和配置

juaby commented 4 years ago

看不懂你的描述,请给出复现流程,或是复现代码和配置

角色:KV集群、PD集群(fake=false)、KV集群客户端 KV集群客户端 启动时,路由表是刷新了,但是没有触发如下逻辑: RouteTable.getInstance().updateConfiguration(raftGroupId, serverList);

最终导致: ···java final Configuration conf = getConfiguration(groupId); if (conf == null) { return new Status(RaftError.ENOENT, "Group %s is not registered in RouteTable, forgot to call updateConfiguration?", groupId); } ··· 异常

我第一次是修改了: com.alipay.sofa.jraft.rhea.client.pd.RemotePlacementDriverClient#refreshRouteTable 增加了RouteTable.getInstance().updateConfiguration(raftGroupId, serverList);逻辑

但是com.alipay.sofa.jraft.rhea.client.pd.RemotePlacementDriverClient是KV集群和KV客户端都使用;

所以现在我改了: com.alipay.sofa.jraft.example.rheakv.Client 增加了refreshRouteTable()

···java

public class Client {

private static final Logger LOG = LoggerFactory.getLogger(Client.class);

private final RheaKVStore rheaKVStore = new DefaultRheaKVStore();

private final int clusterId = Configs.CLUSTER_ID;

private final String clusterName = Configs.CLUSTER_NAME;

public void init() {

    final PlacementDriverOptions pdOpts = PlacementDriverOptionsConfigured.newConfigured()
        //
        .withFake(false)
        // use a fake pd
        .withPdGroupId("pd_test--1")
        .withInitialPdServerList("127.0.0.1:9180,127.0.0.1:9181,127.0.0.1:9182")
        .config();
    final RheaKVStoreOptions opts = RheaKVStoreOptionsConfigured.newConfigured() //
        .withClusterId(this.clusterId)
        .withClusterName(this.clusterName) //
        .withPlacementDriverOptions(pdOpts) //
        .config();
    System.out.println(opts);
    rheaKVStore.init(opts);

    refreshRouteTable(); //here
}

protected void refreshRouteTable() {
    final RemotePlacementDriverClient placementDriverClient = (RemotePlacementDriverClient) rheaKVStore.getPlacementDriverClient();
    final Cluster cluster = placementDriverClient.getMetadataRpcClient().getClusterInfo(this.clusterId);
    if (cluster == null) {
        LOG.warn("Cluster info is empty: {}.", this.clusterId);
        return;
    }
    final List<Store> stores = cluster.getStores();
    if (stores == null || stores.isEmpty()) {
        LOG.error("Stores info is empty: {}.", this.clusterId);
        return;
    }
    for (final Store store : stores) {
        final List<Region> regions = store.getRegions();
        if (regions == null || regions.isEmpty()) {
            LOG.error("Regions info is empty: {} - {}.", this.clusterId, store.getId());
            continue;
        }
        for (final Region region : regions) {
            final String raftGroupId = JRaftHelper.getJRaftGroupId(this.clusterName, region.getId());
            String serverList = region.getPeers().stream().map(Peer::getEndpoint).map(Endpoint::toString).collect(Collectors.joining(","));
            RouteTable.getInstance().updateConfiguration(raftGroupId, serverList);
        }
    }
}

public void shutdown() {
    this.rheaKVStore.shutdown();
}

public RheaKVStore getRheaKVStore() {
    return rheaKVStore;
}

} ···

fengjiachun commented 4 years ago

我昨天不是给了你 pd 的 test class 么?

example 模块里面没有 pd 的 test,如果你想改造,不仅仅要改 Client 类的代码,你同样需要改造 Server1 Server2 Server3 这三个类,这三个类需要连接 pd server 并上报心跳,pd 才可能有整个集群的路由信息,你只改 client 没什么用的

fengjiachun commented 4 years ago

Server1/2/3 你也改下试试

killme2008 commented 4 years ago

我们明确几个问题:

  1. 是不是想改造 rheakv ?
  2. 不是的话,是不是启动顺序错误?或者理解错误?

我感觉这个问题像是一个使用上的问题。

fengjiachun commented 4 years ago

@killme2008 从问题描述看,是想改 example 模块里的样例代码,按照我上一条回复的方式来改造就可以了

killme2008 commented 4 years ago

@fengjiachun 是不是样例代码容错上不够? 误导了?

fengjiachun commented 4 years ago

@killme2008 example 里的样例代码没有连接 pd server,都是无 pd 模式的 example,如果想使用 pd 模式确实需要改造

有 pd 模式的,我有写一个 test class,上一个 issue 已经发过了

fengjiachun commented 4 years ago

@killme2008 example 里的样例代码没有连接 pd server,都是无 pd 模式的 example,如果想使用 pd 模式确实需要改造

有 pd 模式的,我有写一个 test class,上一个 issue 已经发过了

com.alipay.sofa.jraft.rhea.PdServer com.alipay.sofa.jraft.rhea.pd.RheaHeartbeatTest

juaby commented 4 years ago

我昨天不是给了你 pd 的 test class 么?

example 模块里面没有 pd 的 test,如果你想改造,不仅仅要改 Client 类的代码,你同样需要改造 Server1 Server2 Server3 这三个类,这三个类需要连接 pd server 并上报心跳,pd 才可能有整个集群的路由信息,你只改 client 没什么用的

全部开启了真实的PD配置,你的测试CASE没问题,你们可以忽略这些问题了,可能我的使用姿势不对,不过难不成哪一天,你们会遇到同2个问题

juaby commented 4 years ago

可以关闭了

BurnningHotel commented 4 years ago

能提供一个结合springboot使用的最佳实践吗?