jurmous / etcd4j

Java / Netty client for etcd, the highly-available key value store for shared configuration and service discovery.
Apache License 2.0
267 stars 83 forks source link

client.get("/").send().get() stuck #135

Open summershrimp opened 7 years ago

summershrimp commented 7 years ago

Firstly I use etcd4j version 2.11.0, everything is okay butEtcdKeysResponse resp = etcdManagerClient.get(key).recursive().send().get(); throws exception:Invalid field, cause: invalid value for "recursive", at index: 0 Then I update etcd4j to 2.13.0, everything just got stucked

"Thread-15@8372" prio=5 tid=0x23 nid=NA waiting
  java.lang.Thread.State: WAITING
          at java.lang.Object.wait(Object.java:-1)
          at java.lang.Object.wait(Object.java:502)
          at io.netty.util.concurrent.DefaultPromise.awaitUninterruptibly(DefaultPromise.java:254)
          at mousio.client.promises.ResponsePromise.waitForPromiseSuccess(ResponsePromise.java:189)
          at mousio.etcd4j.promises.EtcdResponsePromise.get(EtcdResponsePromise.java:58)
          at net.coding.git.service.DiscoveryService$Discovery.run(DiscoveryService.java:120)

"Thread-14@8373" prio=5 tid=0x22 nid=NA waiting
  java.lang.Thread.State: WAITING
          at java.lang.Object.wait(Object.java:-1)
          at java.lang.Object.wait(Object.java:502)
          at io.netty.util.concurrent.DefaultPromise.awaitUninterruptibly(DefaultPromise.java:254)
          at mousio.client.promises.ResponsePromise.waitForPromiseSuccess(ResponsePromise.java:189)
          at mousio.etcd4j.promises.EtcdResponsePromise.get(EtcdResponsePromise.java:58)
          at net.coding.git.service.DiscoveryService$HeartBeat.run(DiscoveryService.java:179)
{
  "etcdserver": "2.3.7",
  "etcdcluster": "2.3.0"
}
lburgazzoli commented 7 years ago

which etcd version ?

lburgazzoli commented 7 years ago

@summershrimp

I've just added a small test which works on my side, are you able to provide a reproducer ?

lburgazzoli commented 7 years ago

@summershrimp which version of jackson are you using ? etcd4j requires jackson > 2.8

viacheslav-fomin-main commented 7 years ago

@lburgazzoli I have a similar problem. Calling client.getDir(root).recursive().timeout(TIMEOUT_SECS, TimeUnit.SECONDS).send().get().getNode() just hangs and throws timeout exception. Version 2.11 works fine, the problem is with 2.12 and 2.13.

I am running it against etcd Version: 3.1.5

lburgazzoli commented 7 years ago

@viacheslav-fomin-main @summershrimp are you able to provide a reproducer ?

There is a small test about recursive usage which is ok so I'm unable to reproduce your issue.

Please check that you have jackson 2.8 in your runtime classpath.

lujiajing1126 commented 7 years ago

have the same issue, call send().get() stuck

and using timeout has no effect to this situation

lburgazzoli commented 7 years ago

@lujiajing1126 do you have jackson 2.8.x in your classpath ?

wegel commented 7 years ago

Same problem here, with jackson 2.8.8.

lburgazzoli commented 7 years ago

@wegel does this test work for you ?

wegel commented 7 years ago

@lburgazzoli I haven't spent much time testing yet, but that test does work. However, the test tree in that test is very small, and when I simulate a tree with lots of directories and bigger values, I get an io.netty.handler.codec.TooLongFrameException. Setting the frame size on EtcdNettyConfig to something bigger seems to fix the issue (something like new EtcdClient(new EtcdNettyClient(new EtcdNettyConfig().setMaxFrameSize(100 * 100 * 1024))), and this also seems to fix my issue in my actual code. Need more testing though to confirm.

lburgazzoli commented 7 years ago

@wegel there is also a test for a huge dir but does not use recursive get, do you mind sending a pr with a test case which would cover your case ? So i can digg into it a little more.

cmdln commented 7 years ago

I was having this same issue, upgrading Jackson from 2.6.6 to 2.8.8 resolved it for me.

artheus commented 7 years ago

I am having the same exact issue. I use latest etcd4j on etcd version 2.3.8 The problem seems to be that if you do not specify a RetryPolicy (i did RetryOnce) then it will continue to retry forever. It seems to be a problem with "large" responses (mine is 400kb of json).

When I set the retrypolicy like this client.getDir(etcdClientBuilder.getEtcdDir()).setRetryPolicy(new RetryOnce(200)).send().get()

I got a stacktrace like this one

2017-08-24 14:55:31.109 INFO  [main] mousio.etcd4j.transport.EtcdNettyClient Setting up Etcd4j Netty client

mousio.client.exceptions.PrematureDisconnectException
    at mousio.etcd4j.transport.EtcdResponseHandler.channelUnregistered(EtcdResponseHandler.java:94)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:181)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:167)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:160)
    at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:181)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:167)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:160)
    at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:181)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:167)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:160)
    at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelUnregistered(CombinedChannelDuplexHandler.java:405)

2017-08-24 14:55:31.780 INFO  [main] mousio.etcd4j.transport.EtcdNettyClient Shutting down Etcd4j Netty client  at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53)
    at io.netty.channel.CombinedChannelDuplexHandler.channelUnregistered(CombinedChannelDuplexHandler.java:200)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:181)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:167)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:160)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelUnregistered(DefaultChannelPipeline.java:1312)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:181)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:167)
    at io.netty.channel.DefaultChannelPipeline.fireChannelUnregistered(DefaultChannelPipeline.java:826)
    at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:752)
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:445)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
    at java.lang.Thread.run(Thread.java:748)

Is there some kind of maximum file size of the response that makes this fail? This is pretty confusing, as the json etcd responds with is only 400 kb in filesize.

I followed the path in debug in my IDE, and every time etcd actually responds with a success. But for some reason that I cannot find, it just dies and runs a retry.

lburgazzoli commented 7 years ago

You can configure the fame size like:

EtcdNettyConfig config = new EtcdNettyConfig();
config.setMaxFrameSize(1024 * 1024); // Desired max size
EtcdNettyClient nettyClient = new EtcdNettyClient(config, URI.create("http://localhost:4001")); 
EtcdClient etcdClient = new EtcdClient(nettyClient);
artheus commented 7 years ago

It seems that the problem is actually the Netty configuration.

private int maxFrameSize = 1024 * 100;

is one of the lines in the mousio.etcd4j.transport.EtcdNettyConfig class. This means that it limits the maximum file size of responses for Netty to 100kb (which is very small)

There is a way to resolve this. Use -Dmousio.etcd4j.maxFrameSize=1048576 (1 Mb) or something like that to increase that limit. You will get a warning about Deprecation of setting the frame size through a system property. But it should resolve your problem! This is very confusing, and I suggest that this should be change to a much larger number, eg. 100Mb or something like that.

Hope this helps all of you!

lburgazzoli commented 7 years ago

doesn't config.setMaxFrameSize(1024 * 1024); make any difference ?

artheus commented 7 years ago

@lburgazzoli Your way should work fine! But I think that it should be larger by default. Or the retryPolicy by default should be N times, rather than forever. As you see, I made a pull request for bumping the maxFrameSize to 100mb

addname commented 7 years ago

I was having this same issue, upgrading Jackson from 2.7.2 to 2.8.8 resolved it for me.

kpbochenek commented 7 years ago

With jackson 2.9.2 the same problem occurs. But doing config.setMaxFrameSize(1024 * 1024 * 100); doesn't help :/

With jackson 2.8.6 everything works out of the box, no need to change maxFrameSize.