linkedin / cruise-control

Cruise-control is the first of its kind to fully automate the dynamic workload rebalance and self-healing of a Kafka cluster. It provides great value to Kafka users by simplifying the operation of Kafka clusters.
https://github.com/linkedin/cruise-control/tags
BSD 2-Clause "Simplified" License
2.74k stars 587 forks source link

Frequent Index out of range exceptions #215

Closed ghost closed 6 years ago

ghost commented 6 years ago

I seem to be getting frequent index out of range errors when making state requests to CC. Not sure where this might be coming from (broker time? something with my metrics topic?)


{"version":"1","stackTrace":"java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: Index 5083928 is out of range [5083724, 5083724]
    at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915)
    at com.linkedin.kafka.cruisecontrol.servlet.KafkaCruiseControlServlet.getAndMaybeReturnProgress(KafkaCruiseControlServlet.java:1269)
    at com.linkedin.kafka.cruisecontrol.servlet.KafkaCruiseControlServlet.getState(KafkaCruiseControlServlet.java:978)
    at com.linkedin.kafka.cruisecontrol.servlet.KafkaCruiseControlServlet.doGet(KafkaCruiseControlServlet.java:353)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
    at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:841)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:535)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:166)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
    at org.eclipse.jetty.server.Server.handle(Server.java:564)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:317)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:110)
    at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124)
    at org.eclipse.jetty.util.thread.Invocable.invokePreferred(Invocable.java:128)
    at org.eclipse.jetty.util.thread.Invocable$InvocableExecutor.invoke(Invocable.java:222)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:294)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:199)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:673)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:591)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Index 5083928 is out of range [5083724, 5083724]
    at com.linkedin.cruisecontrol.common.WindowIndexedArrays.validateIndex(WindowIndexedArrays.java:45)
    at com.linkedin.cruisecontrol.monitor.sampling.aggregator.RawMetricValues.isValidAtWindowIndex(RawMetricValues.java:164)
    at com.linkedin.cruisecontrol.monitor.sampling.aggregator.MetricSampleAggregator.getWindowState(MetricSampleAggregator.java:442)
    at com.linkedin.cruisecontrol.monitor.sampling.aggregator.MetricSampleAggregator.maybeUpdateAggregatorState(MetricSampleAggregator.java:433)
    at com.linkedin.cruisecontrol.monitor.sampling.aggregator.MetricSampleAggregator.completeness(MetricSampleAggregator.java:269)
    at com.linkedin.kafka.cruisecontrol.monitor.sampling.aggregator.KafkaPartitionMetricSampleAggregator.validPartitionRatioByWindows(KafkaPartitionMetricSampleAggregator.java:222)
    at com.linkedin.kafka.cruisecontrol.monitor.LoadMonitor.state(LoadMonitor.java:205)
    at com.linkedin.kafka.cruisecontrol.KafkaCruiseControl.state(KafkaCruiseControl.java:434)
    at com.linkedin.kafka.cruisecontrol.async.GetStateRunnable.getResult(GetStateRunnable.java:23)
    at com.linkedin.kafka.cruisecontrol.async.GetStateRunnable.getResult(GetStateRunnable.java:15)
    at com.linkedin.kafka.cruisecontrol.async.OperationRunnable.run(OperationRunnable.java:45)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    ... 1 more
","error": "Error processing GET request '/state' due to 'java.lang.IllegalArgumentException: Index 5083928 is out of range [5083724, 5083724]'."}```

These seem to be intermittent, and repeating the request seems to alleviate them. I also seem them in the CC log output as well. 
becketqin commented 6 years ago

@TheyDroppedMe Thanks for reporting the issue. We noticed this as well. It is caused by a race condition in the MetricSampleAggregator. We will fix it shortly.

efeg commented 6 years ago

This issue is fixed in https://github.com/linkedin/cruise-control/pull/217.