openzipkin / zipkin

Zipkin is a distributed tracing system
https://zipkin.io/
Apache License 2.0
17.01k stars 3.09k forks source link

zipkin to elasticsearch error #3433

Closed vjvel closed 7 months ago

vjvel commented 2 years ago

We have deployed zipkin with elastic search in k8s and it was working fine. Now we use zipkin to connect the elastic search in aws with creds but its not working. Its giving the below error. we checked the connectivity from zipkin to elasticsearch using curl it works. from application its giving below error..

2022-03-10 12:51:31.229  WARN [/] 1 --- [orker-epoll-2-2] c.l.a.c.l.LoggingClient                  : [creqId=852846f2, sreqId=5dd28847][http://UNKNOWN/#GET] Request: {startTime=2022-03-10T12:51:21.228Z(1646916681228066), length=0B, duration=10000ms(10000784449ns), cause=com.linecorp.armeria.client.UnprocessedRequestException: com.linecorp.armeria.client.endpoint.EmptyEndpointGroupException, scheme=none+http, name=get-node, headers=[]}
2022-03-10 12:51:31.230  WARN [/] 1 --- [orker-epoll-2-2] c.l.a.c.l.LoggingClient                  : [creqId=852846f2, sreqId=5dd28847][http://UNKNOWN/#GET] Response: {startTime=2022-03-10T12:51:31.229Z(1646916691229031), length=0B, duration=0ns, totalDuration=10000ms(10000966403ns), cause=com.linecorp.armeria.client.UnprocessedRequestException: com.linecorp.armeria.client.endpoint.EmptyEndpointGroupException, headers=[]}

com.linecorp.armeria.client.UnprocessedRequestException: com.linecorp.armeria.client.endpoint.EmptyEndpointGroupException
        at com.linecorp.armeria.client.UnprocessedRequestException.of(UnprocessedRequestException.java:45) ~[armeria-1.13.4.jar:?]
        at com.linecorp.armeria.client.HttpClientDelegate.execute(HttpClientDelegate.java:73) ~[armeria-1.13.4.jar:?]
        at com.linecorp.armeria.client.HttpClientDelegate.execute(HttpClientDelegate.java:47) ~[armeria-1.13.4.jar:?]
        at com.linecorp.armeria.client.metric.AbstractMetricCollectingClient.execute(AbstractMetricCollectingClient.java:61) ~[armeria-1.13.4.jar:?]
        at com.linecorp.armeria.client.encoding.DecodingClient.executeAndDecodeResponse(DecodingClient.java:160) ~[armeria-1.13.4.jar:?]
        at com.linecorp.armeria.client.encoding.DecodingClient.execute(DecodingClient.java:119) ~[armeria-1.13.4.jar:?]
        at com.linecorp.armeria.client.encoding.DecodingClient.execute(DecodingClient.java:49) ~[armeria-1.13.4.jar:?]
        at com.linecorp.armeria.client.logging.AbstractLoggingClient.execute(AbstractLoggingClient.java:125) ~[armeria-1.13.4.jar:?]
        at zipkin2.server.internal.elasticsearch.BasicAuthInterceptor.execute(BasicAuthInterceptor.java:45) ~[classes/:?]
        at zipkin2.server.internal.elasticsearch.BasicAuthInterceptor.execute(BasicAuthInterceptor.java:30) ~[classes/:?]
        at com.linecorp.armeria.internal.client.ClientUtil.pushAndExecute(ClientUtil.java:153) ~[armeria-1.13.4.jar:?]
        at com.linecorp.armeria.internal.client.ClientUtil.initContextAndExecuteWithFallback(ClientUtil.java:107) ~[armeria-1.13.4.jar:?]
        at com.linecorp.armeria.internal.client.ClientUtil.lambda$initContextAndExecuteWithFallback$0(ClientUtil.java:81) ~[armeria-1.13.4.jar:?]
        at java.util.concurrent.CompletableFuture.uniHandle(Unknown Source) ~[?:?]
        at java.util.concurrent.CompletableFuture$UniHandle.tryFire(Unknown Source) ~[?:?]
        at java.util.concurrent.CompletableFuture.postComplete(Unknown Source) ~[?:?]
        at java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source) ~[?:?]
        at com.linecorp.armeria.common.RequestContext.lambda$makeContextAware$3(RequestContext.java:547) ~[armeria-1.13.4.jar:?]
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [netty-common-4.1.70.Final.jar:4.1.70.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) [netty-common-4.1.70.Final.jar:4.1.70.Final]
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384) [netty-transport-classes-epoll-4.1.70.Final.jar:4.1.70.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) [netty-common-4.1.70.Final.jar:4.1.70.Final]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.70.Final.jar:4.1.70.Final]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.70.Final.jar:4.1.70.Final]
        at java.lang.Thread.run(Unknown Source) [?:?]
Caused by: com.linecorp.armeria.client.endpoint.EmptyEndpointGroupException
        at com.linecorp.armeria.client.endpoint.EmptyEndpointGroupException.get(EmptyEndpointGroupException.java:37) ~[armeria-1.13.4.jar:?]
        ... 24 more

2022-03-10 12:51:31.233  WARN [/] 1 --- [orker-epoll-2-2] z.s.i.BodyIsExceptionMessage             : Unexpected error handling request.

java.util.concurrent.RejectedExecutionException: EmptyEndpointGroupException
        at zipkin2.elasticsearch.internal.client.HttpCall.lambda$sendRequest$3(HttpCall.java:227) ~[zipkin-storage-elasticsearch-2.23.16.jar:?]
        at java.util.concurrent.CompletableFuture.uniExceptionally(Unknown Source) ~[?:?]
        at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(Unknown Source) ~[?:?]
        at java.util.concurrent.CompletableFuture.postComplete(Unknown Source) ~[?:?]
        at java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source) ~[?:?]
        at com.linecorp.armeria.common.util.UnmodifiableFuture.doCompleteExceptionally(UnmodifiableFuture.java:139) ~[armeria-1.13.4.jar:?]
        at com.linecorp.armeria.common.util.UnmodifiableFuture.lambda$wrap$0(UnmodifiableFuture.java:98) ~[armeria-1.13.4.jar:?]
        at java.util.concurrent.CompletableFuture.uniHandle(Unknown Source) ~[?:?]
        at java.util.concurrent.CompletableFuture$UniHandle.tryFire(Unknown Source) ~[?:?]
        at java.util.concurrent.CompletableFuture.postComplete(Unknown Source) ~[?:?]
        at java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source) ~[?:?]
        at com.linecorp.armeria.common.stream.DeferredStreamMessage.lambda$delegate$0(DeferredStreamMessage.java:132) ~[armeria-1.13.4.jar:?]
        at java.util.concurrent.CompletableFuture.uniHandle(Unknown Source) ~[?:?]
        at java.util.concurrent.CompletableFuture.uniHandleStage(Unknown Source) ~[?:?]
        at java.util.concurrent.CompletableFuture.handle(Unknown Source) ~[?:?]
        at com.linecorp.armeria.common.stream.DeferredStreamMessage.delegate(DeferredStreamMessage.java:128) ~[armeria-1.13.4.jar:?]
        at com.linecorp.armeria.common.DeferredHttpResponse.delegate(DeferredHttpResponse.java:47) ~[armeria-1.13.4.jar:?]
        at com.linecorp.armeria.common.DeferredHttpResponse.lambda$delegateWhenComplete$0(DeferredHttpResponse.java:58) ~[armeria-1.13.4.jar:?]
        at java.util.concurrent.CompletableFuture.uniHandle(Unknown Source) ~[?:?]
        at java.util.concurrent.CompletableFuture$UniHandle.tryFire(Unknown Source) ~[?:?]
        at java.util.concurrent.CompletableFuture.postComplete(Unknown Source) ~[?:?]
        at java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source) ~[?:?]
        at com.linecorp.armeria.common.RequestContext.lambda$makeContextAware$3(RequestContext.java:547) ~[armeria-1.13.4.jar:?]
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [netty-common-4.1.70.Final.jar:4.1.70.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) [netty-common-4.1.70.Final.jar:4.1.70.Final]
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384) [netty-transport-classes-epoll-4.1.70.Final.jar:4.1.70.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) [netty-common-4.1.70.Final.jar:4.1.70.Final]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.70.Final.jar:4.1.70.Final]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.70.Final.jar:4.1.70.Final]
        at java.lang.Thread.run(Unknown Source) [?:?]
Caused by: com.linecorp.armeria.client.endpoint.EmptyEndpointGroupException
SingKS8 commented 2 years ago

I have the same issue on k8s in this two months(first time deploy in March 2022, everything was ok), all logs about LoggingClient for es api are UNKNOWN host like this "http://UNKNOWN/#GET", then I have change logger level to debug, and dns resolver is ok. i once thought it is a es problem, but other es clients are working. During debugging I find something confusing that is when zipkin server started and connected es failed, then i restart one pod of es cluster node, zipkin server would work again, LoggingClient logs ES_HOSTS not UNKNOWN, but if i restart zipkin server this issue will come again.

John-Athan commented 2 years ago

We are experiencing the same issue. Anyone have any idea why this is happening?

SingKS8 commented 2 years ago

I have the same issue on k8s in this two months(first time deploy in March 2022, everything was ok), all logs about LoggingClient for es api are UNKNOWN host like this "http://UNKNOWN/#GET", then I have change logger level to debug, and dns resolver is ok. i once thought it is a es problem, but other es clients are working. During debugging I find something confusing that is when zipkin server started and connected es failed, then i restart one pod of es cluster node, zipkin server would work again, LoggingClient logs ES_HOSTS not UNKNOWN, but if i restart zipkin server this issue will come again.

I find that might be alpine base image network issues on kubernetes. I have repackage zipkin image with eclipse-temurin ubuntu base image, and it works. I find that more and more libraries running on jvm alpine base image that will have network issues since kubernetes 1.24-1.25 (1.24 in last comment), some components like spring config server, spring config client, some libraries like spring resttemplate, jgit. In my team, since kubernetes being upgraded, almost all projects need to be repackaged from alpine base to non-alpine. So, does the openzipkin team consider providing non-alpine base image as an option?

jacklu2016 commented 2 years ago

I have the same issue on k8s in this two months(first time deploy in March 2022, everything was ok), all logs about LoggingClient for es api are UNKNOWN host like this "http://UNKNOWN/#GET", then I have change logger level to debug, and dns resolver is ok. i once thought it is a es problem, but other es clients are working. During debugging I find something confusing that is when zipkin server started and connected es failed, then i restart one pod of es cluster node, zipkin server would work again, LoggingClient logs ES_HOSTS not UNKNOWN, but if i restart zipkin server this issue will come again.

I find that might be alpine base image network issues on kubernetes. I have repackage zipkin image with eclipse-temurin ubuntu base image, and it works. I find that more and more libraries running on jvm alpine base image that will have network issues since kubernetes 1.24-1.25 (1.24 in last comment), some components like spring config server, spring config client, some libraries like spring resttemplate, jgit. In my team, since kubernetes being upgraded, almost all projects need to be repackaged from alpine base to non-alpine. So, does the openzipkin team consider providing non-alpine base image as an option?

Cloud you share your build image with non-alpine? tks

jeff-lemos commented 1 year ago

I've being having the same issue. If I delete the K8s deployment to change, for example, heap memory and apply again, Zipkin can't no longer search on ES. It only works again if I rename the index using the env var ES_INDEX in Zipkin deployment. It doesn't seem to be a network issue but an app issue, where Zipkin lost some index reference when we redeploy.

But I appreciate if you share your image with us @jacklu2016. Thanks.

jeff-lemos commented 1 year ago

Complementing If my previous index was zipkin-test and for some reason I delete the zipkin deployment to change something or change something on ES that affects the index, I'll have to rename the index but it has to be something different like new-test or new-zipkin-test not zipkin-test2 or zipkin-another-test`. I don't know why but if the name is too much similar, it continues to failing.

DaneLyttinen commented 1 year ago

Same issue, although I am not running this on Kubernetes. The instance can successfully call the /_cluster/health?pretty endpoint using curl, but when I launch Zipkin with the specified ES_HOSTS=..... connect to the Zipkin UI, I just get met with the error above. Trying the /_cat/shards/ on the elastic search instance, I receive no names related to Zipkin

tangmingxiang commented 1 year ago

Upgrade the version of JDK and run Zipkin then.

codefromthecrypt commented 7 months ago

zipkin was renovated at the end of last year including updates to all things including JRE and alpine. Closing this out, but if you have something with a current version, please ping back. FYI zipkin-helm is also much renovated, for those using k8s