We have zipkin-server 2.23.2 reading form kafka with zipkin-collector-kafka and the storage is elasticsearch.
With high number of spans, we found that zipkin-server not work, old region of jvm is 100%,cpu is very high too,then server not working,ui query not working too.
my run server script is :
java -Dzipkin.collector.kafka.bootstrap-servers=192.168.30.72:9092 -Dzipkin.collector.kafka.topic=zipkin -Dzipkin.collector.kafka.groupId=zipkin -Dzipkin.collector.kafka.overrides.max.poll.interval.ms=300000 -Dzipkin.collector.kafka.overrides.max.poll.records=500 -Dzipkin.collector.kafka.overrides.auto.offset.reset=latest -Dzipkin.collector.kafka.streams=16 -Dzipkin.storage.type=elasticsearch -Dzipkin.storage.elasticsearch.hosts=192.168.30.72:19200 -Dzipkin.storage.elasticsearch.username=elastic -Dzipkin.storage.elasticsearch.password=123456 -jar zipkin-server-2.23.2.jar
If this is a UI issue...
In the logs we get the exception:
2021-05-26 08:13:46,548 [armeria-common-worker-epoll-2-13] WARN zipkin2.server.internal.BodyIsExceptionMessage (BodyIsExceptionMessage.java:41) - Unexpected error handling request.
com.linecorp.armeria.common.ClosedSessionException: null
at com.linecorp.armeria.common.ClosedSessionException.get(ClosedSessionException.java:36) ~[armeria-1.3.0.jar!/:?]
at com.linecorp.armeria.server.HttpServerHandler.cleanup(HttpServerHandler.java:233) ~[armeria-1.3.0.jar!/:?]
at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) [netty-common-4.1.54.Final.jar!/:4.1.54.Final]
at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) [netty-common-4.1.54.Final.jar!/:4.1.54.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [netty-common-4.1.54.Final.jar!/:4.1.54.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [netty-common-4.1.54.Final.jar!/:4.1.54.Final]
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384) [netty-transport-native-epoll-4.1.54.Final-linux-x86_64.jar!/:4.1.54.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.54.Final.jar!/:4.1.54.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.54.Final.jar!/:4.1.54.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.54.Final.jar!/:4.1.54.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
2021-05-26 08:21:45,809 [kafka-coordinator-heartbeat-thread | zipkin] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator (AbstractCoordinator.java:904) - [Consumer clientId=consumer-zipkin-12, groupId=zipkin] Group coordinator 192.168.30.72:9092 (id: 2147483647 rack: null) is unavailable or invalid due to cause: null.isDisconnected: true. Rediscovery will be attempted.
2021-05-26 08:25:20,004 [kafka-coordinator-heartbeat-thread | zipkin] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator (AbstractCoordinator.java:1029) - [Consumer clientId=consumer-zipkin-5, groupId=zipkin] Member consumer-zipkin-5-d48b4ded-2cd5-40d0-a36b-9f3dc5d3555b sending LeaveGroup request to coordinator 192.168.30.72:9092 (id: 2147483647 rack: null) due to consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time processing messages. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
Describe the Bug
We have zipkin-server 2.23.2 reading form kafka with zipkin-collector-kafka and the storage is elasticsearch. With high number of spans, we found that zipkin-server not work, old region of jvm is 100%,cpu is very high too,then server not working,ui query not working too. my run server script is : java -Dzipkin.collector.kafka.bootstrap-servers=192.168.30.72:9092 -Dzipkin.collector.kafka.topic=zipkin -Dzipkin.collector.kafka.groupId=zipkin -Dzipkin.collector.kafka.overrides.max.poll.interval.ms=300000 -Dzipkin.collector.kafka.overrides.max.poll.records=500 -Dzipkin.collector.kafka.overrides.auto.offset.reset=latest -Dzipkin.collector.kafka.streams=16 -Dzipkin.storage.type=elasticsearch -Dzipkin.storage.elasticsearch.hosts=192.168.30.72:19200 -Dzipkin.storage.elasticsearch.username=elastic -Dzipkin.storage.elasticsearch.password=123456 -jar zipkin-server-2.23.2.jar
If this is a UI issue...
In the logs we get the exception: 2021-05-26 08:13:46,548 [armeria-common-worker-epoll-2-13] WARN zipkin2.server.internal.BodyIsExceptionMessage (BodyIsExceptionMessage.java:41) - Unexpected error handling request. com.linecorp.armeria.common.ClosedSessionException: null at com.linecorp.armeria.common.ClosedSessionException.get(ClosedSessionException.java:36) ~[armeria-1.3.0.jar!/:?] at com.linecorp.armeria.server.HttpServerHandler.cleanup(HttpServerHandler.java:233) ~[armeria-1.3.0.jar!/:?] at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) [netty-common-4.1.54.Final.jar!/:4.1.54.Final] at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170) [netty-common-4.1.54.Final.jar!/:4.1.54.Final] at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [netty-common-4.1.54.Final.jar!/:4.1.54.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [netty-common-4.1.54.Final.jar!/:4.1.54.Final] at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384) [netty-transport-native-epoll-4.1.54.Final-linux-x86_64.jar!/:4.1.54.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.54.Final.jar!/:4.1.54.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.54.Final.jar!/:4.1.54.Final] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-common-4.1.54.Final.jar!/:4.1.54.Final] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181] 2021-05-26 08:21:45,809 [kafka-coordinator-heartbeat-thread | zipkin] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator (AbstractCoordinator.java:904) - [Consumer clientId=consumer-zipkin-12, groupId=zipkin] Group coordinator 192.168.30.72:9092 (id: 2147483647 rack: null) is unavailable or invalid due to cause: null.isDisconnected: true. Rediscovery will be attempted. 2021-05-26 08:25:20,004 [kafka-coordinator-heartbeat-thread | zipkin] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator (AbstractCoordinator.java:1029) - [Consumer clientId=consumer-zipkin-5, groupId=zipkin] Member consumer-zipkin-5-d48b4ded-2cd5-40d0-a36b-9f3dc5d3555b sending LeaveGroup request to coordinator 192.168.30.72:9092 (id: 2147483647 rack: null) due to consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time processing messages. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.