jurmous / etcd4j

Java / Netty client for etcd, the highly-available key value store for shared configuration and service discovery.
Apache License 2.0
267 stars 83 forks source link

1 node etcd cluster is not working #144

Closed suresh-chaudhari closed 7 years ago

suresh-chaudhari commented 7 years ago

To work etcd client , I find that it requires minimum 2 etcd node up to store any data in etcd server.

Issue: I have created the cluster of 3 etcd node and down 2 etcd node , If we down 2 etcd node it is not able to call the etcd server.

Please resolve this issue ASAP for etcd cluster.

lburgazzoli commented 7 years ago

Can you provide some more details ?

suresh-chaudhari commented 7 years ago

Hi.

Find the all details below: etcd-server version: 3.1.5 org.mousio etcd4j client version: 2.13.0

To reproduce the issue: I created the cluster of 3 nodes on local machine using following commands:

etcd0 node1 script: /home/vf-root/installed/etcd/etcd --name er-etcd0 \ --data-dir /var/lib/etcd/er-ms-node0 \ --listen-client-urls http://0.0.0.0:2379 \ --advertise-client-urls http://127.0.0.1:2379 \ --listen-peer-urls http://127.0.0.1:2380 \ --initial-advertise-peer-urls http://127.0.0.1:2380 \ --initial-cluster er-etcd0=http://127.0.0.1:2380,er-etcd1=http://127.0.0.1:2381,er-etcd2=http://127.0.0.1:2382 \ --initial-cluster-token er-etcd-cluster \ --initial-cluster-state new

etcd1 node2 script:

Before running this script, please run command sudo rm -rf /var/lib/etcd

/home/vf-root/installed/etcd/etcd --name er-etcd1 \ --data-dir /var/lib/etcd/er-ms-node1 \ --listen-client-urls http://0.0.0.0:2378 \ --advertise-client-urls http://127.0.0.1:2378 \ --listen-peer-urls http://127.0.0.1:2381 \ --initial-advertise-peer-urls http://127.0.0.1:2381 \ --initial-cluster er-etcd0=http://127.0.0.1:2380,er-etcd1=http://127.0.0.1:2381,er-etcd2=http://127.0.0.1:2382 \ --initial-cluster-token er-etcd-cluster \ --initial-cluster-state new

etcd2 node3 script: /home/vf-root/installed/etcd/etcd --name er-etcd2 \ --data-dir /var/lib/etcd/er-ms-node2 \ --listen-client-urls http://0.0.0.0:2377 \ --advertise-client-urls http://127.0.0.1:2377 \ --listen-peer-urls http://127.0.0.1:2382 \ --initial-advertise-peer-urls http://127.0.0.1:2382 \ --initial-cluster er-etcd0=http://127.0.0.1:2380,er-etcd1=http://127.0.0.1:2381,er-etcd2=http://127.0.0.1:2382 \ --initial-cluster-token er-etcd-cluster \ --initial-cluster-state new

While you run above 3 scripts it would create the 3 node etcd-cluster, It is working with 3 node and 2 node etcd cluster:

Issue can be reproduce while doing this step: 1.Required to down 2 etcd node from running 3 node etcd-cluster. 2.Used etcd4j library to persist key value pair, it would not able to find single node running etcd cluster and gives the error.

Let me know if you required more things to do.

lburgazzoli commented 7 years ago

A few questions:

suresh-chaudhari commented 7 years ago

when you stop the two nodes, is etcd already connected or you run it after the nodes have been stopped ? I am running the test programme again while while disconnect the two nodes.

lburgazzoli commented 7 years ago

To recap:

  1. create a 3 node cluster
  2. start an etcd4j test --> works
  3. stop 2 nodes
  4. start an etcd4j test --> fail

Correct ? Can you tell me how do you create the etcd4j client ?

suresh-chaudhari commented 7 years ago

Yes it exactly happens whatever you defined.

This is the whatever exactly I am using this etcd client sample:

    URI firstUri = URI.create("http://127.0.0.1:2379");
    URI secondUri = URI.create("http://127.0.0.1:2378");
    URI thirdUri = URI.create("http://127.0.0.1:2377");

    try (EtcdClient etcd = new EtcdClient(firstUri, secondUri, thirdUri)) {
        // Logs etcd version
        System.out.println(etcd.getVersion());
        EtcdKeysResponse response = etcd.put("foo", "bar").send().get();
        // Prints out: bar
        System.out.println(response.node.value);
    }
lburgazzoli commented 7 years ago

What I can see is that there is an exception thrown by etcd:

[300]: Raft Internal Error, cause: etcdserver: request timed out, at index: 0

Does that works with curl ?

suresh-chaudhari commented 7 years ago

While making with curl directly to etcd server it is working actually. I guess issue is in this client.

suresh-chaudhari commented 7 years ago

Sorry to mention above comment, I just tested with this existing cluster and getting this error on etcd server while making curl command

{"errorCode":300,"message":"Raft Internal Error","cause":"etcdserver: request timed out","index":0}

While I am starting 2 node cluster, it woks but when I down again it gives the above error.

lburgazzoli commented 7 years ago

So the behavior is the same with etcd4j and curl correct ?

suresh-chaudhari commented 7 years ago

It seems like now. do you thing etcd-cluster is not working with single node?

lburgazzoli commented 7 years ago

Do you kill the nodes or remove them for the cluster (etcdctl memeber remove id) ? If you just kill the node, the cluster may enter in a mis-configured cluster state, if you instead remove the nodes from the cluster it should not fail

suresh-chaudhari commented 7 years ago

I am simply running the cluster and downing the node by using ctrl+d command. I don't whether they are killing the node or removing Id.

But this failover happen anywhere, machine will crash any time then etcd-cluster must require to work.

can you provide the command to remove nodes from cluster?

suresh-chaudhari commented 7 years ago

Hi, One question more, if we remove the node from the cluster then why it is working with 2 node cluster.?

lburgazzoli commented 7 years ago

You should read the clustering guide, in particular: https://coreos.com/etcd/docs/latest/v2/admin_guide.html#optimal-cluster-size

So a cluster of 3 nodes can handle a failure of 1 node but not 2, if you use etcdctl memeber remove $(id-of-your-node} the cluster is aware of what's happening so it can handle the situation.

lburgazzoli commented 7 years ago

Going to close this issue, please open it again if you find anything related to etcd4j