koordinator-sh / koordinator

A QoS-based scheduling system brings optimal layout and status to workloads such as microservices, web services, big data jobs, AI jobs, etc.
https://koordinator.sh
Apache License 2.0
1.31k stars 326 forks source link

[question] resource reservation does not consider scale up via cluster-autoscaler #2113

Open lukasmrtvy opened 3 months ago

lukasmrtvy commented 3 months ago

Hi, I don't understand correctly the reservation concept. There is an alternative mentioned in docs ( https://koordinator.sh/docs/designs/resource-reservation ) with balloon/pause pods, but these are triggering the scale-up event, thus it's possible to overprovision the cluster. Reservation does not work like this. I would like to have additional reservation for ad-hoc pods ( lets say to have two "spare" nodes to cover some peaks ). Is it possible? Thanks

saintube commented 3 months ago

@lukasmrtvy Hi, the reservation would not trigger the scale-up while the vanilla version of the cluster autoscaler is not aware of the reservation's scheduling result. For this purpose, we need to develop the CA to cooperate with the current version of the reservation. It is a good point for us to support cluster overprovision with the vanilla CA. However, the reservation can reserve the allocatable resources such that other unmatched pods cannot preempt them, which frees us from allocating the reserved resources by the preemption. /cc @ZiMengSheng

lukasmrtvy commented 3 months ago

Kueue has implemented https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/proposals/provisioning-request.md, could it be perhaps similar for Koordinator ?

saintube commented 3 months ago

Kueue has implemented https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/proposals/provisioning-request.md, could it be perhaps similar for Koordinator ?

@lukasmrtvy Yes, they are similar in reserving resources for owner pods. I appreciate your reminder. We are interested in this protocol and maybe we can resolve the above problem with this.

lukasmrtvy commented 3 months ago

@saintube Thanks. The lack of support for CA is the main problem preventing me from adopting Koordinator, so if it is going to be implemented anytime soon, then it would be superb.

JBinin commented 1 week ago

Kindly to ask if there is somebody working for this issue? If not, I'm glad to undertake it. @saintube

saintube commented 1 week ago

Kindly to ask if there is somebody working for this issue? If not, I'm glad to undertake it. @saintube

@JBinin Yes. Welcome to contribute!