Open dtakis opened 2 years ago
Hello! Anyone who has seen this in the past?
I believe that the issue impacts the creation of __CruiseControlMetrics
topic
I see a similar or duplicate of my issue: https://github.com/linkedin/cruise-control/issues/1904
Not sure if this helps, but I recently faced similar issue on Docker based install of CC , it turned out to be an issue with metrics reporter not bootstrapping to the cluster; we forgot to enter proper authentication configs on server.properties ( we use kerberos with SASL_SSL)
These are the server.properties configs that we fixed / provided:
cruise.control.metrics.reporter.ssl.truststore.location
cruise.control.metrics.reporter.ssl.truststore.password
cruise.control.metrics.reporter.bootstrap.servers
cruise.control.metrics.reporter.sasl.mechanism
cruise.control.metrics.reporter.sasl.kerberos.service.name
cruise.control.metrics.reporter.sasl.jaas.config
After setting above to proper values and restarting the Kafka cluster nodes (I also restarted the CC server as well - but maybe it's not required) we noticed the "User-Task-ID missing ... NotEnoughValidWindowsException" error disappeared after a while and we started seeing data showing up on that tab. So this error IMHO seems to be misleading as the real cause is actually not enough data in the topics of CC on the Kafka cluster which is produced by the metrics sampler client (cruise-control-metrics-reporter.jar). In our case the mertics reporter was not able to bootstrap properly and hence was not producing anything to the CC topics.
In addition to above, note the following:
I hope this helps.
Hello! Anyone who has seen this in the past? I believe that the issue impacts the creation of
__CruiseControlMetrics
topic
Hi @dtakis. I'm facing the same issue in AWS MSK. Have you solved it? I f yes, could you please share with us?
Not sure if this helps, but I recently faced similar issue on Docker based install of CC , it turned out to be an issue with metrics reporter not bootstrapping to the cluster; we forgot to enter proper authentication configs on server.properties ( we use kerberos with SASL_SSL)
These are the server.properties configs that we fixed / provided:
cruise.control.metrics.reporter.ssl.truststore.location cruise.control.metrics.reporter.ssl.truststore.password cruise.control.metrics.reporter.bootstrap.servers cruise.control.metrics.reporter.sasl.mechanism cruise.control.metrics.reporter.sasl.kerberos.service.name cruise.control.metrics.reporter.sasl.jaas.config
After setting above to proper values and restarting the Kafka cluster nodes (I also restarted the CC server as well - but maybe it's not required) we noticed the "User-Task-ID missing ... NotEnoughValidWindowsException" error disappeared after a while and we started seeing data showing up on that tab. So this error IMHO seems to be misleading as the real cause is actually not enough data in the topics of CC on the Kafka cluster which is produced by the metrics sampler client (cruise-control-metrics-reporter.jar). In our case the mertics reporter was not able to bootstrap properly and hence was not producing anything to the CC topics.
In addition to above, note the following:
- I'm using RHEL 7 host for the Docker host
- I'm using a more recent version of CC: tag: 2.5.101
- upgraded systemd to version 234
- upgraded kernel to 5.x which has better cgroup capabilities
- upgraded Docker service on host
I hope this helps.
This configs should be places in MSK Configuration?
No @felipeavilis , I paused the debugging and never came back to continue :(
Following the AWS cruise control and cruise control ui installation and configuration against Kafka MSK, I end up in the following situation where cruise control (2.5.42) is connected to MSK (Kafka 2.7.1) but some API calls throw NotEnoughValidWindowsException and while using the Cruise Control UI (0.4.0) I also see the CORS message while I have configured cruise control following the suggested configuration.
CORS
/load and /partition_load API calls failures
I noticed that the Monitor State is continuously running and training is stuck at 0.00% while the
__CruiseControlMetrics
topic is not created. I guess that Cruise Control does not reach the point to be able to create and write in this topic. The topicswere successfully created though.
Thank you in advance for your insights as I see that these errors are very common issues reported here but I could not make any of the suggestions work