Closed dmarupov closed 3 years ago
Hi @dmarupov
As you can see it is not collecting any metrics and I noticed that the timestamps are way off: time range [770800000, 771000000] = time range [June 5, 1994 7:06:40 AM, June 7, 1994 2:40:00 PM]
Could this be the issue? Is there a way to fix this? What else could I look into?
This indeed seems to be the likely root cause. I suspect that the box / VM that the CC instance is running on might have a bad clock. This time range corresponds to the records with timestamps in __CruiseControlMetrics
that CC is trying to read. Hence, if there are no records in this topic from 1994, it won't be able to read any records. This also explains why __KafkaCruiseControlPartitionMetricSamples
and __KafkaCruiseControlPartitionMetricSamples
are empty -- i.e. CC cannot read metrics hence cannot generate samples to back up in these sample store topics.
Can you try running the unix command date
on the box / VM that runs CC? Is the returned time accurate?
Hi @efeg
Thank you for the response and sorry about the duplicate issue. I did not realize I did that.
I ran the date
command on my Unix box and I got the correct date and time as shown below:
I also ran the same command as non root
user and I got the correct date and time.
Is there anything in the CruiseControl Configs that would make it read records from 1994?
Thank you.
@dmarupov This is a little weird. There should not be a config that would make CC read records from '94
. Can you reproduce this locally on a local CC deployment and local brokers (e.g. 2 brokers)? Can you also share your CC version?
I have the exact same issue on a very new infrastructure (zookeeper, Kafka, cc):
cruise-control_1 | 6165644 [MetricFetcher-0] INFO nitor.sampling.SamplingFetcher - Collected 0 partition metric samples for 0 partitions. Total partition assigned: 65.
cruise-control_1 | 6165644 [MetricFetcher-0] INFO nitor.sampling.SamplingFetcher - Collected 0 broker metric samples for 0 brokers.
cruise-control_1 | 6165644 [lingScheduler-1] INFO .sampling.MetricFetcherManager - Finished sampling in 531 ms.
cruise-control_1 | 6165644 [lingScheduler-1] INFO .sampling.MetricFetcherManager - Kicking off metric sampling for time range [986160000, 986280000], duration 120000 ms with timeout 120000 ms.
cruise-control_1 | 6166143 [MetricFetcher-0] INFO clients.consumer.KafkaConsumer - [Consumer clientId=CruiseControlMetricsReporterSampler-consumer--8160475407610004265, groupId=null] Seeking to offset 0 for partition __CruiseControlMetrics-0
cruise-control_1 | 6166181 [MetricFetcher-0] INFO eControlMetricsReporterSampler - Finished sampling for topic partitions [__CruiseControlMetrics-0] in time range [986160000,986280000]. Collected 0 metrics.
my CC version is 2.5.27
For info, after restart, (updated CC version to v2.5.28) it did it again when I clicked on "Boostrap":
cruise-control_1 | 299586 [qtp775741122-62] INFO ler.async.AbstractAsyncRequest - Processing sync request BootstrapRequest.
cruise-control_1 | 299595 [lingScheduler-1] INFO rol.monitor.task.BootstrapTask - Load monitor is bootstrapping since 0
cruise-control_1 | 299603 [lingScheduler-1] INFO .sampling.MetricFetcherManager - Kicking off metric sampling for time range [0, 120000], duration 120000 ms with timeout 120000 ms.
cruise-control_1 | 299610 [MetricFetcher-0] INFO clients.consumer.KafkaConsumer - [Consumer clientId=CruiseControlMetricsReporterSampler-consumer-2810727836355973195, groupId=null] Seeking to offset 0 for partition __CruiseControlMetrics-0
cruise-control_1 | 299661 [omalyDetector-2] INFO .detector.AnomalyDetectorUtils - Skipping anomaly detection because load monitor is in BOOTSTRAPPING state.
cruise-control_1 | 299968 [MetricFetcher-0] INFO eControlMetricsReporterSampler - Finished sampling for topic partitions [__CruiseControlMetrics-0] in time range [0,120000]. Collected 0 metrics.
cruise-control_1 | 299968 [MetricFetcher-0] INFO nitor.sampling.SamplingFetcher - Collected 0 partition metric samples for 0 partitions. Total partition assigned: 65.
cruise-control_1 | 299968 [MetricFetcher-0] INFO nitor.sampling.SamplingFetcher - Collected 0 broker metric samples for 0 brokers.
cruise-control_1 | 299969 [lingScheduler-1] INFO .sampling.MetricFetcherManager - Finished sampling in 366 ms.
cruise-control_1 | 299969 [lingScheduler-1] INFO .sampling.MetricFetcherManager - Kicking off metric sampling for time range [120000, 240000], duration 120000 ms with timeout 120000 ms.
cruise-control_1 | 299992 [MetricFetcher-0] INFO clients.consumer.KafkaConsumer - [Consumer clientId=CruiseControlMetricsReporterSampler-consumer-2810727836355973195, groupId=null] Seeking to offset 0 for partition __CruiseControlMetrics-0
Again, after resart.. everything goes well...
For info, after restart, (updated CC version to v2.5.28) it did it again when I clicked on "Boostrap":
@jrevillard What does it mean to click on Bootstrap
? Are you using the bootstrap endpoint of Cruise Control (CC)? If so, this endpoint is used only for development purposes and is not really meant to be used for bootstrapping a CC instance. When CC starts, it automatically bootstraps w/o the need for any extra call.
@efeg ok, I was using the CC UI Bootstrap "Metric button". I wasn't aware that this is the normal behavior.
Thanks
Hello, I am trying to install Cruise-Control for monitoring our Kafka Environment. I have a distributed Kafka Environment with 3 VMs (Linux OS) and each VM has one Kafka Broker and one Zookeeper in it. So in the
cruisecontrol.properties
file I have:bootstrap.servers=my-domain-dev01.com:9093,my-domain-dev02.com:9094,my-domain-dev03.com:9095
where eachmy-domain-dev#
is a separate VM. I also have the following for the Zookeeper:zookeeper.connect=my-domain-dev01.com:2183,my-domain-dev02.com:2184,my-domain-dev03.com:2185
At this point I am able to see
Kafka Cluster State
just fine but when it comes toMetrics
I am having the following issue:As you can see it is not collecting any metrics and I noticed that the timestamps are way off:
time range [770800000, 771000000]
=time range [June 5, 1994 7:06:40 AM, June 7, 1994 2:40:00 PM]
Could this be the issue? Is there a way to fix this? What else could I look into?
I can see records being populated in
__CruiseControlMetrics
continuously but not__KafkaCruiseControlPartitionMetricSamples
or__KafkaCruiseControlPartitionMetricSamples
.I would appreciate any guidance on this.
Thank you.