strimzi / strimzi-kafka-operator

Apache Kafka® running on Kubernetes
https://strimzi.io/
Apache License 2.0
4.63k stars 1.26k forks source link

An option for disabling Pod disruption budget #6996

Open m-yosefpor opened 2 years ago

m-yosefpor commented 2 years ago

Is your feature request related to a problem? Please describe.

Many infrustructures deny PDB creations for users, so operator will face issues. There should be an option to disable podDisruptionBudget creation in strimzi, to opt-out for creating PDB in such cases.

Describe the solution you'd like

There should be an option in CRD (e.g.e here: https://github.com/strimzi/strimzi-kafka-operator/blob/main/install/cluster-operator/040-Crd-kafka.yaml#L1450) to disable podDisruptionBudget creation (podDisruptionBudget.disabled: true/false) in strimzi, to opt-out for creating PDB in such cases. The default value should be false to conserve current behavior by default.

Describe alternatives you've considered

It can be disabled globally as a flag --disable-pdb-creation in strimzi (not a good solution).

Additional context

scholzj commented 2 years ago

Why do they deny it? PDB is quite essential for running an application like Kafka. I don't think you can have any availability guarantees without it, might impact reliability as well.

In any case, if you would want to disable it, a global flag similar to the one for NEtwork Policies is probably better idea than something in the custom resources as you would need to add it to every single custom resource.

scholzj commented 1 year ago

Triaged on 18.8.2022: Having an environment variable to disable the PDBs globally would make sense (I don't think the flag in CRD makes much sense here). But in anycase, this should have a proposal similar to the Network policies one.

m-yosefpor commented 5 months ago

Triaged on 18.8.2022: Having an environment variable to disable the PDBs globally would make sense (I don't think the flag in CRD makes much sense here). But in anycase, this should have a proposal similar to the Network policies one.

@scholzj proposal added

scholzj commented 3 months ago

@m-yosefpor Do you plan to implement the proposal you wrote? Or should we try to get someone else to implement it?

m-yosefpor commented 3 months ago

@scholzj Yes I will submit a PR for the implementation.

Alansyf commented 1 month ago

Hi, I am having six pods, 3 are brokers and the other 3 are controller.

NAME                      DESIRED REPLICAS   ROLES
oneapi-kafka-broker       3                  ["broker"]
oneapi-kafka-controller   3                  ["controller"]

Operator create min available as five for me:

NAME                         MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
oneapi-kafka-cluster-kafka   5               N/A               1                     153m

This vale is too much for me, when we upgrading our kubernates nodes, this PDB blocked our nodes to restart. Is this global env has be implemented?

Alansyf commented 1 month ago

nvm, i found below piece of code, setup max can help.

 return new PodDisruptionBudgetBuilder()
                .withNewMetadata()
                    .withName(name)
                    .withLabels(labels.withAdditionalLabels(TemplateUtils.labels(template)).toMap())
                    .withNamespace(namespace)
                    .withAnnotations(TemplateUtils.annotations(template))
                    .withOwnerReferences(ownerReference)
                .endMetadata()
                .withNewSpec()
                    .withMinAvailable(new IntOrString(Math.max(0, replicas - (template != null ? template.getMaxUnavailable() : DEFAULT_MAX_UNAVAILABLE))))
                    .withSelector(new LabelSelectorBuilder().withMatchLabels(labels.strimziSelectorLabels().toMap()).build())
                .endSpec()
                .build();
scholzj commented 1 month ago

Keep in mind that increasing the number of nodes that can be down at the same time can cause your Kafka cluster to be unavailable.

weisdd commented 1 month ago

Disabling PDB would be helpful for local development environments, where we care more about staying within resource constrains (e.g. 16 Gb for everything from OS till dev environment and test workloads) than about availability.