Open a7i opened 1 week ago
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign yuchaoran2011 for approval. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
@a7i , I'm wondering if we can use the new feature of the pod template to specify the PDB per PR https://github.com/kubeflow/spark-operator/pull/2141
@a7i , I'm wondering if we can use the new feature of the pod template to specify the PDB per PR #2141
@missedone looks like a useful PR! How would pod template control PDB definition? Are you suggesting to implement a single PDB that prevents a common pod label from being evicted?
Ah right, it’s PDB which need a specific configuration item for it. My brain was stuck :(
Happy to take a look at the PR as well because this may be a useful feature, but if this is specifically for node draining, you could add annotations to prevent eviction: karpenter.sh/do-not-disrupt: "true"
for Karpenter and "cluster-autoscaler.kubernetes.io/safe-to-evict": "false"
for cluster-autoscaler.
"cluster-autoscaler.kubernetes.io/safe-to-evict"
Thanks! The context here is nodegroup upgrades, so we drain the nodes from the old nodegroup. so karpenter or cluster-autoscaler don't come into play.
This would be a great addition - thanks @a7i
"cluster-autoscaler.kubernetes.io/safe-to-evict"
Thanks! The context here is nodegroup upgrades, so we drain the nodes from the old nodegroup. so karpenter or cluster-autoscaler don't come into play.
Ah, so this is a user initiated drain and not one done automatically by either node provisioner e.g. Karpenter drift detection or node consolidation. In that case definitely understand your issue with needing to provision PDBs.
Will review this tonight.
Purpose of this PR
Provide the ability to create PodDisruptionBudget per Spark Application
Proposed changes:
PodDisruptionBudgetSpec
for driver definitionPodDisruptionBudgetSpec
for executor definitionChange Category
Rationale
Our spark pipelines cannot be interrupted and during node drain, we want to prevent eviction of executor and driver pods. Once the pipeline is complete, then the node can be drained. This is natively supported via PodDisruptionBudget with
maxUnavailable: 0
Checklist
Additional Notes