Closed claudio-benfatto closed 3 years ago
@claudio-benfatto Thank you for reporting this issue.
proposals
endpoint or is it relevant to other endpoint, as well? Also, topics.excluded.from.partition.movement
is expected to apply to the response of proposals
endpoint, too. Could you clarify whether you are using this config to exclude topics? That would eliminate the need for providing an excluded_topics
parameter to override this config dynamically.RackAwareGoal
to support a "best-effort" distribution of replicas over the racks for partitions with replication factor > number of racks
.Hi @efeg thank you for your reply,
I'm experiencing this issue when calling the /rebalance
endpoint too. ie. the call:
https://cruisecontrol.internal/kafkacruisecontrol/rebalance?dryrun=true
produces the error:
Error processing POST request '/rebalance' due to: 'com.linkedin.kafka.cruisecontrol.exception.OptimizationFailureException: [RackAwareGoal] Insufficient number of racks to distribute each replica (Current: 3, Needed: 6).'.
in a cluster with 3 racks and 6 brokers, where the only topic with replication factor > 3 is __broker-replication-check
(6), and with the following configuration parameter set:
topics.excluded.from.partition.movement="__broker-\d+-health-check|__broker-replication-check"
while executing the same call but passing excluded_topics
explicitly to the POST
method as a parameter:
https://cruisecontrol.internal/kafkacruisecontrol/rebalance?excluded_topics=__broker-\d+-health-check|__broker-replication-check&dryrun=true
I get the correct behaviour:
Optimization has 21 inter-broker replica(66861 MB) moves, 0 intra-broker replica(0 MB) moves and 38 leadership moves with a cluster model of 1 recent windows and 93.805% of the partitions covered.
Excluded Topics: [__broker-11-health-check, __broker-28-health-check, __broker-10-health-check, __broker-replication-check, __broker-26-health-check, __broker-12-health-check, __broker-27-health-check].
Incidentally I noticed that URL encoding can produce unexpected results, eg. \d+
is urlencoded and it loses its meaning when it is passed as part of a regex
2.
We didn't hit this problem for now, because we enforce all of our topics to have replication factor = 3 (aside from the replication one) and we also have 3 racks.
As mentioned before __broker-replication-check
is a very special topic, and because of the way it is managed (replicas are assigned manually) we want to avoid CC to interfere with it.
However I think that this feature is a really good addition!
@claudio-benfatto
topics.excluded.from.partition.movement
config. I suspect the quotes should be dropped -- i.e.
topics.excluded.from.partition.movement=__broker-\d+-health-check|__broker-replication-check
For manually passing the requests via curl
with proper encoding, you may need to encode the url (e.g. via https://www.urlencoder.org/) before sending it to CC. For example, to send __broker-\d+-health-check|__broker-replication-check
as part of a curl
request, I use the url-encoded version of it (i.e. __broker-%5Cd%2B-health-check%7C__broker-replication-check
).
com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareDistributionGoal
. This goal would help if you'd have replication factor > number of racks for topics that are not excluded.Closing this issue -- please feel free to reopen if the issue is continuing.
Hi all,
Issue
We'd like to have the possibility to exclude a few internal topics permanently from the goal state validations and executions via a global configuration property.
Context
We are using a custom health check for Kafka that checks replication issues by setting the replication factor equal to the number of brokers and making sure that each broker is in the ISR list for the topic. As a result this topic will have always
replication factor = #number of brokers
, in spite of the actual number of racks across which those brokers are distributed.ie. in a cluster of 6 nodes (distributed across 3 racks) we always hit the following error:
because of:
Proposal
At the moment we can successfully use the endpoints by adding the
excludedTopics
parameter to all the requests, like in:however this is subject to errors and quite brittle, ie. it breaks the UI endpoints and makes the validation chore difficult to manage with several clients accessing the api.
It would be great if we could configure a global property in the
cruisecontrol.properties
file, similar totopics.excluded.from.partition.movement
in order to set the excluded topics in a single place valid for the entire application.What do you think?
Thanks!