Closed pedrojflores closed 4 years ago
Hi @pedrojflores
To see the underlying producer exception received by Cruise Control metrics reporter, would you be able to enable debug
level logs for the underlying producer and share the stack trace?
Will do. Let me get that set up and will post the stack strace as soon I as get one.
Here's what I'm seeing after setting the log level to DEBUG
2020-08-11 14:51:20,699] DEBUG [Producer clientId=CruiseControlMetricsReporter] Node -2 disconnected. (org.apache.kafka.clie
nts.NetworkClient)
[2020-08-11 14:51:20,699] DEBUG [Producer clientId=CruiseControlMetricsReporter] Give up sending metadata request since no no
de is available (org.apache.kafka.clients.NetworkClient)
[2020-08-11 14:51:20,749] DEBUG [Producer clientId=CruiseControlMetricsReporter] Give up sending metadata request since no no
de is available (org.apache.kafka.clients.NetworkClient)
[2020-08-11 14:51:20,799] DEBUG [Producer clientId=CruiseControlMetricsReporter] Give up sending metadata request since no no
de is available (org.apache.kafka.clients.NetworkClient)
[2020-08-11 14:51:20,839] DEBUG [ReplicaFetcher replicaId=1002, leaderId=1001, fetcherId=0] Node 1001 sent an incremental fet
ch response for session 454667497 with 0 response partition(s), 38 implied partition(s) (org.apache.kafka.clients.FetchSessio
nHandler)
[2020-08-11 14:51:20,839] DEBUG [ReplicaFetcher replicaId=1002, leaderId=1001, fetcherId=0] Built incremental fetch (sessionI
d=454667497, epoch=163) for node 1001. Added 0 partition(s), altered 0 partition(s), removed 0 partition(s) out of 38 partiti
on(s) (org.apache.kafka.clients.FetchSessionHandler)
[2020-08-11 14:51:20,850] DEBUG [Producer clientId=CruiseControlMetricsReporter] Give up sending metadata request since no no
de is available (org.apache.kafka.clients.NetworkClient)
[2020-08-11 14:51:20,900] DEBUG [Producer clientId=CruiseControlMetricsReporter] Give up sending metadata request since no no
de is available (org.apache.kafka.clients.NetworkClient)
[2020-08-11 14:51:20,950] DEBUG [Producer clientId=CruiseControlMetricsReporter] Give up sending metadata request since no no
de is available (org.apache.kafka.clients.NetworkClient)
[2020-08-11 14:51:21,000] DEBUG [Producer clientId=CruiseControlMetricsReporter] Give up sending metadata request since no no
de is available (org.apache.kafka.clients.NetworkClient)
[2020-08-11 14:51:21,051] DEBUG [Producer clientId=CruiseControlMetricsReporter] Give up sending metadata request since no no
de is available (org.apache.kafka.clients.NetworkClient)
[2020-08-11 14:51:21,101] DEBUG [Producer clientId=CruiseControlMetricsReporter] Give up sending metadata request since no no
de is available (org.apache.kafka.clients.NetworkClient)
[2020-08-11 14:51:21,139] DEBUG [ReplicaFetcher replicaId=1002, leaderId=1003, fetcherId=0] Node 1003 sent an incremental fet
ch response for session 2110777851 with 0 response partition(s), 39 implied partition(s) (org.apache.kafka.clients.FetchSessi
onHandler)
[2020-08-11 14:51:21,346] DEBUG [ReplicaFetcher replicaId=1002, leaderId=1001, fetcherId=0] Built incremental fetch (sessionI
d=454667497, epoch=164) for node 1001. Added 0 partition(s), altered 0 partition(s), removed 0 partition(s) out of 38 partiti
on(s) (org.apache.kafka.clients.FetchSessionHandler)
[2020-08-11 14:51:21,359] DEBUG [Producer clientId=CruiseControlMetricsReporter] Give up sending metadata request since no no
de is available (org.apache.kafka.clients.NetworkClient)
[2020-08-11 14:51:21,409] DEBUG [Producer clientId=CruiseControlMetricsReporter] Give up sending metadata request since no no
de is available (org.apache.kafka.clients.NetworkClient)
[2020-08-11 14:51:21,459] DEBUG [Producer clientId=CruiseControlMetricsReporter] Give up sending metadata request since no no
de is available (org.apache.kafka.clients.NetworkClient)
[2020-08-11 14:51:21,509] DEBUG [Producer clientId=CruiseControlMetricsReporter] Initialize connection to node <some node>:9093 (id: -1 rack: null) for sending metadata request (org.apache.kafka.clients.NetworkClient)
[2020-08-11 14:51:21,510] DEBUG [Producer clientId=CruiseControlMetricsReporter] Initiating connection to node <some node>:9093 (id: -1 rack: null) using address <someip> (org.apache.kafka.clients.
NetworkClient)
[2020-08-11 14:51:21,510] DEBUG [Producer clientId=CruiseControlMetricsReporter] Created socket with SO_RCVBUF = 32768, SO_SN
DBUF = 131072, SO_TIMEOUT = 0 to node -1 (org.apache.kafka.common.network.Selector)
[2020-08-11 14:51:21,510] DEBUG [Producer clientId=CruiseControlMetricsReporter] Completed connection to node -1. Fetching AP
I versions. (org.apache.kafka.clients.NetworkClient)
[2020-08-11 14:51:21,510] DEBUG [Producer clientId=CruiseControlMetricsReporter] Initiating API versions fetch from node -1.
(org.apache.kafka.clients.NetworkClient)
[2020-08-11 14:51:21,566] DEBUG [Producer clientId=CruiseControlMetricsReporter] Connection with <some ip> disconnected (org.apache.kafka.common.network.Selector)
java.io.EOFException
at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:119)
at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:436)
at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:397)
at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:653)
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:574)
at org.apache.kafka.common.network.Selector.poll(Selector.java:485)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:539)
at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:335)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:244)
at java.base/java.lang.Thread.run(Thread.java:834)
@pedrojflores This is a configuration issue. You are missing at least the following config:
cruise.control.metrics.reporter.security.protocol=SSL
Hope it helps!
Thanks @efeg
These are the cruise control options I'm using now and still having issues.
cruise.control.metrics.reporter.ssl.client.auth=requested
cruise.control.metrics.reporter.advertised.listeners=TRUSTED://myhostname:9093,PLAINTEXT://myhostname:9094
cruise.control.metrics.reporter.inter.broker.listener.name=TRUSTED
cruise.control.metrics.reporter.listeners=TRUSTED://:9093,PLAINTEXT://:9094
cruise.control.metrics.reporter.listener.security.protocol.map=PLAINTEXT:PLAINTEXT,TRUSTED:SSL
cruise.control.metrics.reporter.bootstrap.servers=list of servers listening on 9093
cruise.control.metrics.reporter.security.inter.broker.protocol=SSL
cruise.control.metrics.reporter.security.protocol=SSL
cruise.control.metrics.reporter.ssl.keystore.location=/etc/ssl/kafka.server.keystore.jks
cruise.control.metrics.reporter.ssl.keystore.password=mypassword
cruise.control.metrics.reporter.ssl.truststore.location=/etc/ssl/kafka.server.truststore.jks
cruise.control.metrics.reporter.ssl.truststore.password=mypassword
I'm getting these errors now which seem to be ssl related.
[2020-08-13 10:27:39,603] DEBUG [SocketServer brokerId=1002] Connection with /<some_broker> disconnected (org.apache.kafka.common.network.Selector)
java.io.EOFException
at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:96)
at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:436)
at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:397)
at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:653)
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:574)
at org.apache.kafka.common.network.Selector.poll(Selector.java:485)
at kafka.network.Processor.poll(SocketServer.scala:884)
at kafka.network.Processor.run(SocketServer.scala:783)
at java.base/java.lang.Thread.run(Thread.java:834)
I'm not sure what other ssl related options I need to provide cruise control for the metrics reporter to work properly.
Seeing this as well
[2020-08-13 11:12:46,696] ERROR Got exception in Cruise Control metrics reporter (com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter)
java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
at com.linkedin.kafka.cruisecontrol.metricsreporter.metric.MetricsUtils.yammerMetricScopeToTags(MetricsUtils.java:208)
at com.linkedin.kafka.cruisecontrol.metricsreporter.metric.MetricsUtils.isInterested(MetricsUtils.java:196)
at com.linkedin.kafka.cruisecontrol.metricsreporter.metric.YammerMetricProcessor.processGauge(YammerMetricProcessor.java:139)
at com.linkedin.kafka.cruisecontrol.metricsreporter.metric.YammerMetricProcessor.processGauge(YammerMetricProcessor.java:24)
at com.yammer.metrics.core.Gauge.processWith(Gauge.java:28)
at com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter.reportYammerMetrics(CruiseControlMetricsReporter.java:336)
at com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter.run(CruiseControlMetricsReporter.java:268)
at java.base/java.lang.Thread.run(Thread.java:834)
So to take out mSSL out of the picture as a possible culprit here I went ahead and configured the cruise control metrics reporter to connect to PLAINTEXT ports and updated the acls on the cruise control topics to allow the ANONYMOUS user to access those topics.
cruise.control.metrics.reporter.bootstrap.servers=broker1.ec2.internal:9094,broker2.ec2.internal:9094,broker3.ec2.internal:9094
cruise.control.metrics.reporter.security.protocol=PLAINTEXT
However I'm still seeing the following error in the broker logs:
[2020-09-15 21:55:07,274] ERROR Got exception in Cruise Control metrics reporter (com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter)
java.lang.ArrayIndexOutOfBoundsException
[2020-09-15 21:56:07,275] ERROR Got exception in Cruise Control metrics reporter (com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter)
java.lang.ArrayIndexOutOfBoundsException
[2020-09-15 21:57:07,276] ERROR Got exception in Cruise Control metrics reporter (com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter)
java.lang.ArrayIndexOutOfBoundsException
[2020-09-15 21:58:07,277] ERROR Got exception in Cruise Control metrics reporter (com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter)
java.lang.ArrayIndexOutOfBoundsException
[2020-09-15 21:59:07,278] ERROR Got exception in Cruise Control metrics reporter (com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter)
java.lang.ArrayIndexOutOfBoundsException
Any ideas? Anyone?
@pedrojflores
According to stack trace shared in https://github.com/linkedin/cruise-control/issues/1296#issuecomment-673570001, this is an issue with failure to parse the scope of a Yammer metric -- i.e. scope
is expected have a .
in it to separate KV pairs, but in this case it seems to have none.
This might be due to differences between Yammer metrics in Confluent Kafka
and Apache Kafka. Created a PR to address this.
First time trying to get Cruise Control up and running in a three node Confluent Platform 5.3 Kafka cluster using mutual TLS auth. I followed the instructions at https://github.com/linkedin/cruise-control/blob/master/README.md and I'm currently running into an issue where according to my Kafka logs I'm not able to send Cruise Control metrics. Here's a sample of the log messages I'm seeing
My server.prpoerties looks like this (some values are redacted)
Cruise Control jar file location
Cruise Control Topic
Any help in figuring out what's preventing me from publishing metrics will be greatly appreciated.