strimzi / strimzi-kafka-operator

Apache Kafka® running on Kubernetes
https://strimzi.io/
Apache License 2.0
4.8k stars 1.28k forks source link

[Enhancement] Investigate/update default topic configuration configurations for Cruise Control #4268

Closed kyguy closed 1 year ago

kyguy commented 3 years ago

Is your feature request related to a problem? Please describe. The default topic configurations for the following Cruise Control topics are based on the defaults configurations of the upstream Cruise Control project [1]

These configurations may not be optimal for a default or minimal Kafka cluster and should be investigated and/or updated.

Describe the solution you'd like A justification of existing defaults, new defaults, or heuristic of how these numbers are chosen based on Kafka cluster size.

[1] Refer to partition.sample.store.topic.partition.count and broker.sample.store.topic.partition.count settings here: https://github.com/linkedin/cruise-control/blob/447ba1aed506b1383f725bfad0aca56370a612f3/docs/wiki/User%20Guide/Configurations.md#kafkasamplestore-configurations

scholzj commented 2 years ago

Triaged on 12.4.2022: Sounds like a good idea. But it is not clear whether this is already configurable or whether some changes are needed. @katheris will look into whether some work is needed or not.

katheris commented 2 years ago

From an initial investigation, if I set the following in the Kafka CR under CruiseControl.config:

sample.store.topic.replication.factor: 1
partition.sample.store.topic.partition.count: 5
broker.sample.store.topic.partition.count: 5

then both the broker sample store and partition sample store get the expected replication factor of 1 and partitions of 5 so that works fine.

I'm still trying to understand the connection between the metric reporter topic and the settings like cruise.control.metrics.topic.replication.factor: 1

ppatierno commented 2 years ago

I'm still trying to understand the connection between the metric reporter topic and the settings like cruise.control.metrics.topic.replication.factor: 1

What do you mean? This is the replication factor of the topic used by the metric reporter to send metrics.

https://github.com/linkedin/cruise-control/blob/migrate_to_kafka_2_4/cruise-control-metrics-reporter/src/main/java/com/linkedin/kafka/cruisecontrol/metricsreporter/CruiseControlMetricsReporter.java#L205

Maybe you are asking for something different I didn't get?

katheris commented 2 years ago

@ppatierno yes, but if I set cruise.control.metrics.topic.replication.factor in my CR the topic is still created with replication factor 3. Plus I noticed that there seems to be two configs relating to a metrics topic: metric.reporter.topic and metric.reporter.topic

kyguy commented 2 years ago

@katheris IIRC the cruise.control.metrics.topic config and all other Cruise Control Metric Reporter configs must be set in the spec.kafka.config instead of the spec.cruiseControl.config, since the Cruise Control Metric Reporters are running as agents of the Kafka brokers. Is the cruise.control.metrics.topic.replication.factor config being set in spec.kafka.config?

katheris commented 2 years ago

@kyguy ah yeah if I add it to spec.kafka.config then it does change.

So a question I have is, for the name of the metrics reporter topic, should we let users specify it in the cruise control config as metric.reporter.topic and then we will update cruise.control.metrics.topic to match. Or should they specify configuration (name, partitions, replication factor) for the two sample stores (partition and broker) in the cruise control config and specify all metric reporter topic config in the Kafka config?

I feel like given where we are today, perhaps the latter @ppatierno @scholzj what do you think?

scholzj commented 2 years ago

I think changing the configuration of these is very niche use-case. So I prefer cleaner code and less user convenience. So probably having the user configure this on two places 🤷

katheris commented 2 years ago

Current behaviour is that metric.reporter.topic is the overriding setting, so if that is set that value will get used, if it is not set the default is used. I think this is better than users having to set it in two places and code is relatively clean

kyguy commented 2 years ago

should we let users specify it in the cruise control config as metric.reporter.topic and then we will update cruise.control.metrics.topic to match.

Sounds like a nice feature to me, from what I understand the code changes required wouldn't be too invasive in this specific case. We would just need to set cruise.control.metrics.topic to the value of metric.reporter.topic in the KafkaConfiguration [1] if metric.reporter.topic was set by the user and cruise.control.metrics.topic was not.

[1] https://github.com/strimzi/strimzi-kafka-operator/blob/8e83a2c3eec06589e8c872119e9cde83b05445d8/cluster-operator/src/main/java/io/strimzi/operator/cluster/model/KafkaConfiguration.java#L31

ppatierno commented 2 years ago

I think we are in a little bit of mess here from a UX point of view. As a user, I would expect that everything related to Cruise Control would be configured under spec.cruiseControl.config. I don't care if there is some configuration related to CC itself and other related to the metrics reporter that, even if part of CC, is actually running on the Kafka side. I just think "I configured everything related to CC in spec.cruiseControl.config and the Strimzi operator will take care to spread the configuration in the right places for me". Tbh I am for having a single place and not two places with the override concept. Said that, I think we are screwed up alredy due to backward compatibility. That is because in the spec.kafka.config we are allowing the following parameters related to the metrics reporter configuration.

https://github.com/strimzi/strimzi-kafka-operator/blob/main/api/src/main/java/io/strimzi/api/kafka/model/KafkaClusterSpec.java#L49

and we cannot make them forbidden now I guess (which is what I would do, leaving ALL CC related config params in the spec.cruiseControl.config). The metric.reporter.topic is the only param that works the way I would like: configuring in the CC section of the Kafka CR, because we forbid the set in the Kafka config and default to a specific one.

Regarding this issue itself I am more confused re-reading it now. We are searching for new defaults (as it seems in the description) or making the various replication factor and partition count configurable? I guess the latter one. And this seems already working.

katheris commented 2 years ago

@ppatierno I agree with your point about the UX experience. As a new user who didn't know how to configure cruise control in Strimzi I did the following:

  1. Read the CC docs to find the config option I needed
  2. Added the config to the "config" section of CC
  3. Was confused when it didn't work

In the Strimzi documentation where it talks about CC config it doesn't mention that you have to set the metrics reporter config in the Kafka section rather than the CC section as far as I can see: https://strimzi.io/docs/operators/latest/configuring.html#ref-cruise-control-configuration-str

In terms of the aim of this issue, during the issue triage we decided to first determine what the current behaviour was and then decide first, should we have different defaults, and second should they be configurable.

scholzj commented 2 years ago

TBH, I'm not sure I would call this bad UX. This is just misunderstanding how Cruise Control works and how it collects the metrics. I do not think you can create a better UX by mixing the values in .spec.cruiseControl.config. The only way to improve the config would be to have either a separate set of fields for it or separate map for metrics provider config. But you cannot mix them, that would just confuse other users who understand how this works and how it has to be configured.

ppatierno commented 2 years ago

Tbh I see the user configuring CC related stuff only under the spec.cruiseControl field which is demonstrated by how @katheris started trying this way. A user is configuring CC, he doesn't care how it's going to be configured by the operator: part on CC itself, part on the metrics reporter in Kafka. Maybe to make it clearer we could have spec.cruiseControl.config.reporter but still under the CC section and locking down the metrics reporter fields under spec.kafka.config. But anyway I guess we cannot do that for backward compatibility.

scholzj commented 2 years ago

TBH, I do not remember anyone else trying to configure it in wrong place. Likely because nobody really configures it. If you wanna have it in spec.cruiseControl, fine, that makes some sense. But you cannot do it in spec.cruiseControl.config and spec.cruiseControl.config.reporter does not make sense either because it creates a very weird resource which is nesting config maps. So you would need something like spec.cruiseControl.reporter. You would also need to validate it for allowed / not allowed values from the metrics reporter configuration and you would need to validate it for not-allowed values of Kafka configuration (so that it cannot be used to inject blocked values). And also handle the backwards compatibility and deprecation of the old fields configured in .spec.kafka.config. Is it really worth it?

katheris commented 2 years ago

I guess we never really got to an agreement on this issue about the way to set specify the configurations, but I wanted to at least circle back on the default values.

So as a reminder the defaults currently are: strimzi.cruisecontrol.metrics (partitions: 1, replication factor: 1) strimzi.cruisecontrol.modeltrainingsamples (partitions: 32, replication factor: 2) strimzi.cruisecontrol.partitionmetricsamples (partitions: 32, replication factor: 2)

Based on the research I've done I think the partition values are fine. Both samples have had 32 from the initial commit of cruise control onwards and I haven't been able to find a compelling reason to choose an alternative value. My understanding is that the metrics topic is consumed by a single consumer that doesn't use a group. So increasing the partitions from 1 for that topic won't benefit the consumer. I guess the only reason we would consider increasing this number is just to spread the data out a bit more across the cluster?

For the replication factor, Cruise Control uses a value of either 2, or the number of brokers (whichever is lower) for the sample topics. I guess this is so there is always a replica online if a broker fails. I do wonder whether it would be beneficial to increase this to 3 for a three broker Kafka? And for a single broker Kafka it has to be 1, but I don't expect many people would deploy Cruise Control for a single broker Kafka since replicas can't be moved between brokers at that point.

For the metrics topic, as far as I can tell Cruise Control does not provide a default value for replication factor, as it expects the user to create the topic. So I guess the default of 1 is set by Strimzi? I am curious about why we have chosen this default, is it because the data in this topic is immediately read and aggregated into the other two topics for storing so we don't mind if we lose this data?

ppatierno commented 2 years ago

For the metrics topic, as far as I can tell Cruise Control does not provide a default value for replication factor, as it expects the user to create the topic.

I see CC metrics reporter creating the metrics topic here https://github.com/linkedin/cruise-control/blob/migrate_to_kafka_2_4/cruise-control-metrics-reporter/src/main/java/com/linkedin/kafka/cruisecontrol/metricsreporter/CruiseControlMetricsReporter.java#L261 I don't think the user should take care of creating that topics.

The number of partitions and replication factor are set by Strimzi coming from the default in the Kafka configuration or using 1 for both. https://github.com/strimzi/strimzi-kafka-operator/blob/main/cluster-operator/src/main/java/io/strimzi/operator/cluster/model/KafkaCluster.java#L517

katheris commented 2 years ago

I don't think the user should take care of creating that topics.

I agree, what I meant was that it seems the Cruise Control project doesn't have a specific opinion on sensible default values for the replication factor of that topic, either the user (or in our case Strimzi) creates the topic, or it gets created with the broker defaults as far as I can see. So those defaults have been decided by the Strimzi team, not by the cruise control team

kyguy commented 2 years ago

So increasing the partitions from 1 for that topic won't benefit the consumer.

That makes sense but then makes me curious about the strimzi.cruisecontrol.modeltrainingsamples and strimzi.cruisecontrol.partitionmetricsamples topics and why they are split into 32 partitions since those topics should only have a single consumer as well.

I guess the only reason we would consider increasing this number is just to spread the data out a bit more across the cluster?

What would be the benefit of sharding this data when there is only a single consumer?

I do wonder whether it would be beneficial to increase this to 3 for a three broker Kafka?

I guess it becomes a question of how quickly Cruise Control can recover and generate the minimal number of samples needed to preform a rebalance in the event where we have lost all our sample replicas. The recovery time is dependent on the number of brokers in the cluster and the following configurations: metric.sampling.interval.ms, broker.metrics.window.ms, min.samples.per.broker.metrics.window, min.samples.per.partition.metrics.window. In Strimzi, these configurations are preset to generate a proposal in around ~5 minutes, so the recovery time is not that bad so the extra sample replicas are not worth the overhead cost (for Strimzi's default configurations) IMO. That being said, a user could configure the cluster in a way that would make the generation/recovery time take much longer. If we could devise a method of accurately estimated the recovery time based on cluster size and these window configuration values we dynamically configure the replication factor of the sample topics.

I am curious about why we have chosen this default, is it because the data in this topic is immediately read and aggregated into the other two topics for storing so we don't mind if we lose this data?

That is my understanding as well

katheris commented 2 years ago

What would be the benefit of sharding this data when there is only a single consumer?

To spread out disk usage across the brokers, if you have a single partition with one replica and lots of messages you can potentially end up with very skewed disk usage that isn't resolvable without changing the partitioning strategy

larvinloy commented 2 years ago

We are experiencing this skew on our staging environment where the traffic across topics is not as consistent or predictable as production. Running a CruiseControl balance on this cluster results in broker-1 having only 1 leader, which belongs to strimzi.cruisecontrol.metrics.

image

I can't find any documentation to determine if it's safe to increase the partition count. If it needs to be 1, I presume it's to preserve ordering? Do you have any advice around this, @katheris ?

Btw, for increasing the replication of CruiseControl topics, we did it by modifying the respective KafkaTopic CR.

katheris commented 2 years ago

Hi @larvinloy I don't believe the partition count needs to be 1, otherwise I would expect Cruise Control to hard code it to 1. So I don't see any reason that it would be unsafe to increase the partition count.

larvinloy commented 2 years ago

Thanks, @katheris !

katheris commented 1 year ago

Hey, circling back on this issue. It sounds to me like we are all happy with the current default values of:

strimzi.cruisecontrol.metrics (partitions: 1, replication factor: 1) strimzi.cruisecontrol.modeltrainingsamples (partitions: 32, replication factor: 2) strimzi.cruisecontrol.partitionmetricsamples (partitions: 32, replication factor: 2)

@ppatierno @kyguy are you happy with the above given the discussion? Can I go ahead and close this issue as done since we don't want to change the defaults?

ppatierno commented 1 year ago

That's fine with me.

kyguy commented 1 year ago

Sounds good to me @katheris!