Feat: capacity aware boosting

mikouaj commented 4 months ago

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Description

The capacity aware boosting will make the CPU resource boost conditional: the mutating webhook would try to verify if the given POD, with a boosted resources, would be schedulable on a cluster:

The result of negative verification may vary depending on the configuration and may include: no boost at all or boost up to available capacity
The verification algorithm should take into consideration Cluster Autoscaler operation
- It could be up to the configuration what to do if effective Cluster Autoscaler is in operation like ignore it or always boost
- To be considered: Cluster Autoscaler's Provisioning Requests

This feature requires scheduling algorithm simulation, including node selection and resource check. There is no API for this and real scheduling algorithm is a complex task, so some sort of simplification to produce "good enough" results is needed.

References

yyvess commented 1 month ago

On the case we don't want impacting the scheduler, bost only the limit is not sufisant ? As opposite, for pod that required a bost, updating the request, will impact the scheduler, but that is what is expected.

As resume, not sure too understand on which usecase you want to impact the schedulers but not scale nodes or disable a bost if the pod cannot be scheduled.

Pheraps will be greate to add an option to increase in percentage only the limit, eventually an others to remove limit during the bost.

mikouaj commented 1 month ago

@yyvess the use case we try to solve is as follows: 1) The POD resource requests are increased per StartupCPUBoost config 2) The scheduler is not able to find a suitable nodes (no capacity) and the POD is unschedulable 3) (autoscaler path) The Cluster Autoscaler kicks in and provisions new nodes to accommodate boosted PODs 4) (autoscaler path) The PODs are scheduled on a new nodes 5) (autoscaler path) The PODs CPU requests are reverted back to original values 6) (autoscaler path) After some time the Cluster Autoscaler considers nodes as underutilized (as bigger CPU resources were reverted back) and triggers scale-in action 7) (autoscaler path) The PODs are being evicted from the nodes and rescheduled somewhere else. We start with point 1). This may even repeat in a loop.

With this feature we aim to solve around point 2) - to give a user possibility to decide if CPU boosting can lead to unschedulable PODs.

yyvess commented 1 month ago

@mikouaj I understand that point 2 can be an issue. As you explain to solving this issue isn't easly. Until it to avoid this case you can allowing to only bost the limit value (and don t touch the request) that should not impact the scheduler and avoid point 2.. But actually is not possible to bost in ÷ only the limit.

Ps: It can be also interesting to allow during the bost to remove the limit value to to use all node cpu during the bost.

mikouaj commented 1 month ago

@yyvess I like the idea of removing limit value during the boost. It sounds obvious now but I have never though about it before, many thanks! I will create a feature to introduce that possibility in a config driven way.

For the resource requests, boosting them is needed to actually guarantee the resources - although it comes with all of the described challenges. Addressing this can be tough but I believe it is still doable.

google / kube-startup-cpu-boost