Closed cstampfel closed 4 months ago
Why doesn't Strimzi support the RackAwareDistributionGoal
goal?
Also as a sidenote, you will not have HA cluster over 2 zones. With RF=4 and min.insync.repcias=2, you can have both insync-replicas in in a single zone. With RF=4 and min.insync.replicas=3, you will not be available after loosing one AZ. So you really should use the 3 AZs if you have them.
Thank you for your response. I thought that if the racks are configured on the broker, then in case of minISR=2 kafka would also try to distribute the data across the racks/availability zone?
Or am i mistaken?
My problem is that the third zone is minimally equipped, without a storage system and only with local storage. There are some restrictions in our infrastructure at the moment.
Thanks in advance
I thought that if the racks are configured on the broker, then in case of minISR=2 kafka would also try to distribute the data across the racks/availability zone?
The min.insync.replicas are not something Kafka distributes. They happen for various other reasons (broker restart, client configuration, slow networking between AZs etc.). The number is just the minimum of replicas that have to be in sync to allow producers to produce new messages. It does not take the racks into account. So it can easily happen that you will have both of the in-sync replicas in the same rack/zone and lose not only the avialability, but possibly also the data. The only real protection is to have 3 as a minimum. That gives you certainty that at least one will be in each zone and you don't lose any data. But if you lose a whole zone, you will need to for example reconfigure the topic to allow producers to work again. So the availability will suffer from that.
BAck to the original question, the com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareDistributionGoal
seems to work fine, you just need to add it to the goals
in the CC configuration.
@kyguy @ppatierno Is there any reason why RackAwareDistributionGoal
isn't added to the goals by default?
FYI, from my archives, I see that we encountered this problem in mid-2023: at the time, the RackAwareDistributionGoal
seemed not to be recognized by the Cruise Control backend. I can reproduce this today (from CC UI), with the following error:
ERROR: Error processing POST request '/rebalance' due to: 'com.linkedin.kafka.cruisecontrol.exception.KafkaCruiseControlException:
java.lang.IllegalArgumentException:
Goals [RackAwareDistributionGoal] are not supported.
Supported: [CpuCapacityGoal, CpuUsageDistributionGoal, DiskCapacityGoal,
DiskUsageDistributionGoal, IntraBrokerDiskCapacityGoal,
IntraBrokerDiskUsageDistributionGoal, LeaderBytesInDistributionGoal,
LeaderReplicaDistributionGoal, MinTopicLeadersPerBrokerGoal,
NetworkInboundCapacityGoal, NetworkInboundUsageDistributionGoal,
NetworkOutboundCapacityGoal, NetworkOutboundUsageDistributionGoal,
PotentialNwOutGoal, PreferredLeaderElectionGoal, RackAwareGoal,
ReplicaCapacityGoal, ReplicaDistributionGoal, TopicReplicaDistributionGoal]'.
@Pinimo And did you added it to the goals
option?
@scholzj Thank you very much for the detailed explanation about the Kafka replication mechanism. I've tried to activate the Goal with the following configuration:
cruiseControl:
config:
goals: >
com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareDistributionGoal
hard.goals: >
com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareDistributionGoal
But i get the following error within the cruise control pod:
2024-07-04 13:24:44 ERROR KafkaCruiseControlMain:33 - Uncaught exception on thread Thread[main,5,main]
org.apache.kafka.common.config.ConfigException: Attempt to configure default goals with unsupported goals (default.goals:[com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.MinTopicLeadersPerBrokerGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuCapacityGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.PotentialNwOutGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskUsageDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundUsageDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundUsageDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuUsageDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.TopicReplicaDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderReplicaDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderBytesInDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.PreferredLeaderElectionGoal] and goals:[com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareDistributionGoal]).
at com.linkedin.kafka.cruisecontrol.config.KafkaCruiseControlConfig.sanityCheckGoalNames(KafkaCruiseControlConfig.java:202) ~[cruise-control-2.5.137.jar:?]
at com.linkedin.kafka.cruisecontrol.config.KafkaCruiseControlConfig.<init>(KafkaCruiseControlConfig.java:53) ~[cruise-control-2.5.137.jar:?]
at com.linkedin.kafka.cruisecontrol.config.KafkaCruiseControlConfig.<init>(KafkaCruiseControlConfig.java:47) ~[cruise-control-2.5.137.jar:?]
at com.linkedin.kafka.cruisecontrol.KafkaCruiseControlUtils.readConfig(KafkaCruiseControlUtils.java:873) ~[cruise-control-2.5.137.jar:?]
at com.linkedin.kafka.cruisecontrol.KafkaCruiseControlMain.main(KafkaCruiseControlMain.java:34) ~[cruise-control-2.5.137.jar:?]
Stream closed EOF for kafka-test/at-wrwks-kafka-test-cruise-control-98658f4f7-sgx4l (cruise-control)
I'am using Strimzi Kafka Operator 0.40.0 at the moment. Am I missing something in the configuration?
As the error suggests -> you need to adjust the default.goals
configuration as well. I'm also not sure you can configure just the single goal there. The goals configuration in Cruise cOntrol seems a bit chaotic to be honest.
I'm not sure if I understand the configuration options of cruise control correctly. If I configure the following, then everything works:
cruiseControl:
config:
default.goals: >
com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.MinTopicLeadersPerBrokerGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuCapacityGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaDistributionGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.PotentialNwOutGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskUsageDistributionGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundUsageDistributionGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundUsageDistributionGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuUsageDistributionGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.TopicReplicaDistributionGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderReplicaDistributionGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderBytesInDistributionGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.PreferredLeaderElectionGoal
But the following doesnt't work:
cruiseControl:
config:
default.goals: >
com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.MinTopicLeadersPerBrokerGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuCapacityGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaDistributionGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.PotentialNwOutGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskUsageDistributionGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundUsageDistributionGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundUsageDistributionGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuUsageDistributionGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.TopicReplicaDistributionGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderReplicaDistributionGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderBytesInDistributionGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.PreferredLeaderElectionGoal,
com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareDistributionGoal
Preparing truststore for Cruise Control
Adding /etc/cruise-control/cluster-ca-certs/ca.crt to truststore /tmp/cruise-control/replication.truststore.p12 with alias ca
Certificate was added to keystore
Preparing truststore for Cruise Control is complete
Preparing keystore for Cruise Control
Preparing keystore for Cruise Control is complete
Starting Cruise Control with configuration:
bootstrap.servers=at-wrwks-kafka-test-kafka-bootstrap:9091
webserver.accesslog.path=/tmp/access.log
webserver.http.address=0.0.0.0
webserver.http.cors.allowmethods=OPTIONS,GET
webserver.ssl.keystore.location=/tmp/cruise-control/cruise-control.keystore.p12
webserver.ssl.keystore.password=[hidden]
webserver.ssl.keystore.type=PKCS12
webserver.ssl.key.password=[hidden]
security.protocol=SSL
ssl.keystore.type=PKCS12
ssl.keystore.location=/tmp/cruise-control/cruise-control.keystore.p12
ssl.keystore.password=[hidden]
ssl.truststore.type=PKCS12
ssl.truststore.location=/tmp/cruise-control/replication.truststore.p12
ssl.truststore.password=[hidden]
kafka.broker.failure.detection.enable=true
capacity.config.file=/opt/cruise-control/custom-config/capacity.json
completed.user.task.retention.time.ms=86400000
metric.reporter.topic=strimzi.cruisecontrol.metrics
hard.goals=com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuCapacityGoal
num.broker.metrics.windows=20
broker.metrics.window.ms=300000
sample.store.topic.replication.factor=4
num.partition.metrics.windows=1
broker.metric.sample.store.topic=strimzi.cruisecontrol.modeltrainingsamples
webserver.ssl.enable=true
partition.metric.sample.store.topic=strimzi.cruisecontrol.partitionmetricsamples
webserver.auth.credentials.file=/opt/cruise-control/api-auth-config/cruise-control.apiAuthFile
default.goals=com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.MinTopicLeadersPerBrokerGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuCapacityGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.PotentialNwOutGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskUsageDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundUsageDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundUsageDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuUsageDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.TopicReplicaDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderReplicaDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderBytesInDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.PreferredLeaderElectionGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareDistributionGoal\n
partition.metrics.window.ms=300000
goals=com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.MinTopicLeadersPerBrokerGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.PotentialNwOutGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.TopicReplicaDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderReplicaDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderBytesInDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.PreferredLeaderElectionGoal
webserver.security.enable=true
+ exec /usr/bin/tini -w -e 143 -- java -Xms128M -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Dlog4j2.configurationFile=file:/opt/cruise-control/custom-config/log4j2.properties -classpath ':/opt/cruise-control/libs/*' com.linkedin.kafka.cruisecontrol.KafkaCruiseControlMain /tmp/cruisecontrol.properties
2024-07-04 15:30:14 ERROR KafkaCruiseControlMain:33 - Uncaught exception on thread Thread[main,5,main]
org.apache.kafka.common.config.ConfigException: Attempt to configure default goals with unsupported goals (default.goals:[com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.MinTopicLeadersPerBrokerGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuCapacityGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.PotentialNwOutGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskUsageDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundUsageDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundUsageDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuUsageDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.TopicReplicaDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderReplicaDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderBytesInDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.PreferredLeaderElectionGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareDistributionGoal] and goals:[com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.MinTopicLeadersPerBrokerGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuCapacityGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.PotentialNwOutGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskUsageDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundUsageDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundUsageDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuUsageDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.TopicReplicaDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderReplicaDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderBytesInDistributionGoal, com.linkedin.kafka.cruisecontrol.analyzer.goals.PreferredLeaderElectionGoal]).
at com.linkedin.kafka.cruisecontrol.config.KafkaCruiseControlConfig.sanityCheckGoalNames(KafkaCruiseControlConfig.java:202) ~[cruise-control-2.5.137.jar:?]
at com.linkedin.kafka.cruisecontrol.config.KafkaCruiseControlConfig.<init>(KafkaCruiseControlConfig.java:53) ~[cruise-control-2.5.137.jar:?]
at com.linkedin.kafka.cruisecontrol.config.KafkaCruiseControlConfig.<init>(KafkaCruiseControlConfig.java:47) ~[cruise-control-2.5.137.jar:?]
at com.linkedin.kafka.cruisecontrol.KafkaCruiseControlUtils.readConfig(KafkaCruiseControlUtils.java:873) ~[cruise-control-2.5.137.jar:?]
at com.linkedin.kafka.cruisecontrol.KafkaCruiseControlMain.main(KafkaCruiseControlMain.java:34) ~[cruise-control-2.5.137.jar:?]
Stream closed EOF for kafka-test/at-wrwks-kafka-test-cruise-control-6cd9fd6dd6-qvlbm (cruise-control)
You basically need to have in-sync goals
, default.goals
and hard.goals
. But TBH, I do not properly understand it either ... I reconfigured it like 10 times yesterday before I got it to work.
Thank you very much for your help. I have now found the correct configuration:
cruiseControl:
config:
default.goals: >
com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareDistributionGoal
goals: >
com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareDistributionGoal
hard.goals: >
com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareDistributionGoal
Let's keep this open to figure out why isn't the RackAwareDistributionGoal goal part of goals
by default.
@scholzj The Strimzi documentation is a bit misleading https://strimzi.io/docs/operators/latest/deploying#default-goals
Unless you specify default.goals in the Cruise Control deployment configuration, the main optimization goals are used as the default optimization goals. In this case, the cached optimization proposal is generated using the main optimization goals.
It can be read as "the value is taken from main goals," but from the test, I think it means "uses the predefined value for main goals" (https://strimzi.io/docs/operators/latest/deploying#main-goals)
The other misunderstanding comes from https://strimzi.io/docs/operators/latest/deploying#goals_order_of_priority
Strimzi supports most of the optimization goals developed in the Cruise Control project. The supported goals, in the default descending order of priority, are as follows: Which does not list RackAwareDistributionGoal, because it seems to be focused on the default order, but can be misread as "only those listed are supported".
To save some time, the list of supported goals in Cruise Control is here https://github.com/linkedin/cruise-control/blob/main/cruise-control/src/main/java/com/linkedin/kafka/cruisecontrol/config/constants/AnalyzerConfig.java#L260
@pkleindl I'm not sure I follow your comments. Possibly because I do not know much about Cruise Control and did not wrote the docs. If you think you can write it better how it is used by Strimzi, feel free to open PR.
Discussed on the community call on 10.7.2024: this should wait for the next call where we will hopefully have more Cruise Control SMEs. But it seems like this goal should be enabled by default.
But it seems like this goal should be enabled by default.
We can safely add RackAwareDistributionGoal
to the Strimzi goals
by default. It should be there anyway. I believe the only reason that it is not is due to oversight, the RackAwareDistributionGoal
goal was created after the initial Strimzi Cruise Control integration was introduced and we never noticed to add it.
That being said, even though both the RackAwareGoal
and RackAwareDistributionGoal
can be added to the Strimzi goals
by default a user can only have one of these goals listed in their default.goals
and hard.goal
configurations because the goals contradict each other.
RackAwareGoal - Ensures that all replicas of each partition are assigned in a rack aware manner -- i.e. no more than one replica of each partition resides in the same rack.
RackAwareDistributionGoal - A relaxed version of RackAwareGoal. Contrary to RackAwareGoal, as long as replicas of each partition can achieve a perfectly even distribution across the racks, this goal lets placement of multiple replicas of a partition into a single rack.
Since they are both hard
goals, only one can be specified in the default.goals
and hard.goal
configuration at once. Therefore, in this particular scenario where we could have multiple replicas of a partition in a single rack, we would need to update our default.goals
and hard.goal
configurations by:
RackAwareGoal
RackAwareDistributionGoal
Related problem
Our organization operates a Kafka cluster within Kubernetes using the Strimzi Operator. To ensure high availability and fault tolerance, we aim to distribute our Kafka brokers and data across multiple Availability Zones. Currently, we are planning to split the brokers across 2 Availability Zones, with 2 brokers running in each zone. A third Zookeeper will additionally run in the third Availability Zone.
Every availability zone is defined as rack in strimzi by using the following topology key:
Cruise Control is a crucial tool for balancing and managing Kafka cluster workloads, but its current implementation in Strimzi lacks support for the
RackAwareDistributionGoal
. Unfortunately, we cannot use the supportedRackAwareGoal
because it requires having as many racks as the replication factor of the topics. In our setup, we will set the replication factor to 4 to tolerate the failure of an entire Availability Zone.The RackAwareDistributionGoal is essential for:
Suggested solution
Integrate the
RackAwareDistributionGoal
into the Cruise Control configuration within Strimzi. This could involve:RackAwareDistributionGoal
.Alternatives
No response
Additional context
Impact This feature will significantly benefit organizations that deploy Kafka clusters across multiple Availability Zones, providing a more robust and fault-tolerant setup. It aligns with the best practices for high availability and disaster recovery in cloud environments.