knative / serving

Kubernetes-based, scale-to-zero, request-driven compute
https://knative.dev/docs/serving/
Apache License 2.0
5.58k stars 1.16k forks source link

PodDisruputionBudget for knative service pods #13768

Open ashrafguitoni opened 1 year ago

ashrafguitoni commented 1 year ago

In what area(s)?

/area autoscale

Describe the feature

Although Knative autoscaling can maintain a number of minimum replicas per revision, I think this is only limited to actions that Knative controls. If other actors evict Knative service pods, then the service may have less available pods than the minimum replicas. One example of other actors that can mess up the Knative minimum state is the high-performance cluster autoscaler Karpenter which has a consolidation feature.

The way I'm trying to mitigate this problem is manually creating a PodDisruptionBudget targeting the pods of my Knative service, with the PDB's minAvailable value set to the KSVC's autoscaling.knative.dev/min-scale value.

I was asked by @dprotaso to mention my case in a Github issue here, so please let me know what you think.

ReToCode commented 1 year ago

I think its a valid point to discuss, we already do have PDBs for the important data-path components. But it might need some design on when to create/update/delete the PDBs (as min-scale is not always set and scaling itself is dynamic, including to zero). Also there is a certain overhead to adjusting the PDBs on scaling changes, so up for discussion.

/triage accepted

BobyMCbobs commented 1 year ago

I would find having PDBs created based on min-scale to be rather valuable for my workloads in a semi-disruptive environment with cluster-upgrades

whatnick commented 1 month ago

I have had experience where roling nodes during cluster upgrades gets stuck due the KNative SPOF pre-set PDB's . It would good to have default values for these to allow at least single node disruptions during cluster node rolls and upgrades.