Closed saritago closed 5 years ago
Looks like you are using a version of Cruise Control before PR-https://github.com/linkedin/cruise-control/pull/409. I assume you also observe that your executor substate is stuck at STARTING_EXECUTION
-- you should verify this using state
endpoint with substates=executor
parameter.
To pick up the relevant fix, use Cruise Control versions:
2.0.11+
for Kafka 0.11 or 1.0
.0.1.16+
for Kafka 1.1+
.But i have the same version running on several other clusters and they work just fine. Does it have anything to do with the volume of data on kafka clusters?
from the Exception you pasted, looks like Cruise Control is unable to get the offset correspond to certain timestamp for the metric topic __CruiseControlMetrics
. can you first confirm that this topic exist in the cluster and reporters are properly producing message to it?
Another thing is that, as Efe mentioned, if it is a transient thing, the newer version will make sure sampling failure do not block operation from execution.
Closing the issue, as the https://github.com/linkedin/cruise-control/issues/502#issuecomment-457866690 provides the solution to a known issue. As discussed in the Gitter channel, this is a concurrency bug; hence, it is possible that you haven't observed this behavior on other clusters so far.
@efeg @kidkun I have downloaded the latest CC code and yet seeing the warnings as
[2019-02-26 10:18:18,823] WARN Encountered error when loading sample from Kafka. (com.linkedin.kafka.cruisecontrol.monitor.sampling.KafkaSampleStore)
org.apache.kafka.common.errors.TimeoutException: Failed to get offsets by times in 50000 ms
[2019-02-26 10:27:26,226] ERROR Sampling scheduler received Unknown exception when waiting for sampling to finish (com.linkedin.kafka.cruisecontrol.monitor.sampling.MetricFetcherManager)
java.util.concurrent.TimeoutException
[2019-02-26 10:27:26,227] WARN Sampling did not finish in 300000 ms, skipping this sampling interval. (com.linkedin.kafka.cruisecontrol.monitor.task.SamplingTask)
Please ignore above ping, it was my mistake, i was using older client.
Hi,
i am seeing multiple warning and error messages related to timeouts in the logs. Also the get commands for state are going in queue state.
I initially had metric.sampling.interval.ms value set to 300000, after this seeing warning i set it to 500000 but i still see these messages.
In addition to these warning i aslo see below error messages here and there