jurmous / etcd4j

Java / Netty client for etcd, the highly-available key value store for shared configuration and service discovery.
Apache License 2.0
267 stars 83 forks source link

Thead can not be finished after close etcdclient #95

Closed tyhoho closed 8 years ago

tyhoho commented 8 years ago

etcd version: 2.3.1 etcd4j version: 2.10.0


description: There are 3 nodes in etcd cluster, I created a EtcdClient, then try to set value, then shut down the EtcdClient. The Thread can be finished. But if one of the the 3 nodes in the cluster is down, the thread can not be finished, it will be always running.


Code: EtcdClient etcd = new EtcdClient( URI.create("http://node1:2379") ,URI.create("http://node2:2379") ,URI.create("http://node3:2379") );

try { EtcdKeysResponse response = etcd.put("java", "hi2").timeout(1, TimeUnit.SECONDS).send().get(); System.out.println(response.node.value); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (EtcdException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (EtcdAuthenticationException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (TimeoutException e) { // TODO Auto-generated catch block e.printStackTrace(); }

try { etcd.close(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); }

tyhoho commented 8 years ago

Anybody can help on this ?

also if try to connet to a not started etcd, the program can not exit either. Code is as below and node1 is not started. The thread is never end:


Code: EtcdClient etcd = new EtcdClient( URI.create("http://node1:2379") );

try { EtcdKeysResponse response = etcd.put("java", "hi2").timeout(1, TimeUnit.SECONDS).send().get(); System.out.println(response.node.value); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (EtcdException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (EtcdAuthenticationException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (TimeoutException e) { // TODO Auto-generated catch block e.printStackTrace(); }

try { etcd.close(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); }

lburgazzoli commented 8 years ago

By default EtcdClient uses an ExponentialBackOff retry policy which give-up after some time, you can customize it by setting EtcdClient's retry handler, like:

@Grab(group='org.slf4j', module='slf4j-simple', version='1.7.14')
@Grab(group='org.mousio', module='etcd4j', version='2.10.0')

import mousio.client.retry.*
import mousio.etcd4j.*
import java.util.concurrent.*

def etcd = new EtcdClient()
etcd.retryHandler = new RetryOnce(1000)

def response = etcd.put("java", "hi2").timeout(1, TimeUnit.SECONDS).send().get()

println(response.node.value);

Then if you run it without any etcd server running, it will give-up immediately:

[nioEventLoopGroup-2-1] DEBUG io.netty.buffer.AbstractByteBuf - -Dio.netty.buffer.bytebuf.checkAccessible: true
[nioEventLoopGroup-2-1] DEBUG mousio.etcd4j.transport.EtcdNettyClient - Connection failed to https://127.0.0.1:4001
[nioEventLoopGroup-2-1] DEBUG mousio.client.retry.RetryPolicy - Retry 1 to send command
[nioEventLoopGroup-2-2] DEBUG mousio.etcd4j.transport.EtcdNettyClient - Connection failed to https://127.0.0.1:4001
Caught: java.net.ConnectException: Connection refused: /127.0.0.1:4001
java.net.ConnectException: Connection refused: /127.0.0.1:4001
    at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:225)
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:527)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:467)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:381)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:353)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:742)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)

If you instead use a RetryNTimes policy. i.e. saying you want to retry 2 times, it will retry well, 2 times :-)

[nioEventLoopGroup-2-1] DEBUG io.netty.buffer.AbstractByteBuf - -Dio.netty.buffer.bytebuf.checkAccessible: true
[nioEventLoopGroup-2-1] DEBUG mousio.etcd4j.transport.EtcdNettyClient - Connection failed to https://127.0.0.1:4001
[nioEventLoopGroup-2-1] DEBUG mousio.client.retry.RetryPolicy - Retry 1 to send command
[nioEventLoopGroup-2-2] DEBUG mousio.etcd4j.transport.EtcdNettyClient - Connection failed to https://127.0.0.1:4001
Caught: java.net.ConnectException: Connection refused: /127.0.0.1:4001
java.net.ConnectException: Connection refused: /127.0.0.1:4001
    at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:225)
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:527)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:467)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:381)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:353)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:742)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
tyhoho commented 8 years ago

Hi Luca, Thanks for your quick response! Still got the problem, could you please take a further look?

I tried the RetryOnce solution. When connect to a stopped etcd, Yes it will stop retrying, but the thead is not stoped, if you are using Eclipse as the IDE, you can see the Terminate button is still red (the blue rectangle). That means the thread is not ended. You can see it from the attached hanging.jpg.

hanging

If it connects to a running etcd, after calling etcdclient.close(), the Teminate button in Eclispe IDE is gray. You can see it from the attached stoped.jpg.

stopped

My code is :

public class EtcdUtil {

public static void main(String[] args) {
    EtcdClient etcd = new EtcdClient(
            URI.create("http://localhost:4001")
            );
    etcd.setRetryHandler(new RetryOnce(1000));
    try {
        try {
            EtcdKeysResponse response = etcd.put("java", "hi2").timeout(1, TimeUnit.SECONDS).send().get();
            System.out.println(response.node.value);
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (EtcdException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (EtcdAuthenticationException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (TimeoutException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    } finally {
        try {
            etcd.close();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }

    }

}
tyhoho commented 8 years ago

by the way, I'm using jdk1.7

tyhoho commented 8 years ago

There are deadlocks detected.


2016-04-14 14:06:02 Full thread dump Java HotSpot(TM) Client VM (24.0-b56 mixed mode, sharing):

"DestroyJavaVM" prio=6 tid=0x002cf400 nid=0x1f2c waiting on condition [0x00000000] java.lang.Thread.State: RUNNABLE

"pool-1-thread-1" prio=6 tid=0x04de4000 nid=0x2338 waiting on condition [0x0556f000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at io.netty.util.HashedWheelTimer$Worker.waitForNextTick(HashedWheelTimer.java:461) at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:360) at java.lang.Thread.run(Thread.java:724)

"Service Thread" daemon prio=6 tid=0x00b61000 nid=0x2134 runnable [0x00000000] java.lang.Thread.State: RUNNABLE

"C1 CompilerThread0" daemon prio=10 tid=0x00b5cc00 nid=0x214c waiting on condition [0x00000000] java.lang.Thread.State: RUNNABLE

"Attach Listener" daemon prio=10 tid=0x00b50c00 nid=0x13e8 waiting on condition [0x00000000] java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x00b50000 nid=0x194c runnable [0x00000000] java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=8 tid=0x00ae3400 nid=0x21b4 in Object.wait() [0x0459f000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method)

"Reference Handler" daemon prio=10 tid=0x00ae1c00 nid=0x21ec in Object.wait() [0x0448f000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method)

"VM Thread" prio=10 tid=0x00ae0800 nid=0xc64 runnable

"VM Periodic Task Thread" prio=10 tid=0x00b63800 nid=0x1358 waiting on condition

JNI global references: 241

lburgazzoli commented 8 years ago

Thx, now it is much clear

lburgazzoli commented 8 years ago

@tyhoho can you test with the latest code ?

tyhoho commented 8 years ago

@lburgazzoli HI Luca, Very appriciate for your quick response. Yes it worked when connecting to a single dead etcd client!

But when connect to a cluster, eg there are 3 nodes, only node1 is stopped, and node2 and node3 are running. After calling etcdclient.close(), the Thead can not stopped.

Code: EtcdClient etcd = new EtcdClient( URI.create("http://node1:2379") ,URI.create("http://node2:2379") ,URI.create("http://node3:2379") );


By the way, I looked into a litlle bit into the code, why do you connect to ectd every time when calling etcdclient.**().send(), insteadof reusing the established connection?

lburgazzoli commented 8 years ago

If I'm not wrong, but I'm not the original author, each connection is closed after the request has been processed. I will have a further look later on

jurmous commented 8 years ago

When I created it I chose to create a connection with each request and close it when it is done. It was at that time simpler than implementing a channel pool to recycle channels. In the mean time Netty introduced a built-in Channelpool https://github.com/netty/netty/issues/3218 so it is now much easier to implement it.

lburgazzoli commented 8 years ago

thx @jurmous , I may try to switch to channel pool next week

jurmous commented 8 years ago

I did take a quick look into it but it does not seem to be easy. We need to reconsider how the promise works and we currently adapt the channel pipeline for each request which makes it difficult to reuse channel. These steps need likely to be redesigned before a channel can be recycled within a pool.

jurmous commented 8 years ago

Regarding the issue: etcdclient.close() should terminate the eventloopgroup so it should close down all the threads and thus all waiting connections. Can you see what the remaining thread is blocking on?

lburgazzoli commented 8 years ago

Looks like it is the reconnect timer, should be fixed by a side pr

lburgazzoli commented 8 years ago

@tyhoho can you check with the latest code ?

jurmous commented 8 years ago

I published a new SNAPSHOT: 2.10.2-SNAPSHOT If correct I can publish the final.

tyhoho commented 8 years ago

@lburgazzoli @jurmous it worked this time, for both standalone and cluster. Thanks.

tyhoho commented 8 years ago

btw, when do you suggest to close the etcd client? Eg, I got a web app deployed on jboss, which is used only to handle the etcd related opperations. So I can reuse same etcdclient for every request, right? The etcd client got closed when jboss stops, there's no right time to close the etcdclient :(

lburgazzoli commented 8 years ago

@tyhoho it is hard to advice when to close the etcd client, it really depends about your usage pattern but yes you can re-use the etcd client. @jurmous I think we can release a new version now.

ytcoode commented 8 years ago

Waiting for the new version!

jurmous commented 8 years ago

Unfortunately I cannot release because of a test failure. #107

It seems I need a bit more time to fix it properly than I currently have...

jurmous commented 8 years ago

Released.

ytcoode commented 8 years ago

Thanks!