kubernetes-sigs / kueue

Kubernetes-native Job Queueing
https://kueue.sigs.k8s.io
Apache License 2.0
1.27k stars 225 forks source link

Is it possible to support queueing deployment? #867

Open lizzzcai opened 1 year ago

lizzzcai commented 1 year ago

Hi experts,

I found this project and am keen on exploring it on queueing the job (for example argo workflow). I am wondering if k8s deployment is supported as well? or is it possible to extend it to support queueing deployment (as I saw custom workload in the doc). If yes, is there any doc and how easy it is to support a custom workload.

The reason I checked with deployment is that I like the feature (resource quota and the good local queue isolation) supported in this project and I am thinking if it is capable to be a generic quota management service to all the workload (not just job, but pod as well.)

Thanks.

alculquicondor commented 1 year ago

It is currently not in our roadmap, as Deployments don't have the concept of completion.

But yes, you could implement an integration controller to support deployments. Here is the framework to implement these controllers https://github.com/kubernetes-sigs/kueue/tree/main/pkg/controller/jobframework. We haven't had the chance to write a tutorial about this yet. Also the framework is still evolving, as we integrate more CRDs.

The key concept you have to implement is the idea of "suspend": when a job is suspended, there shouldn't be any pods. I think you could implement this in the deployment by setting the replicas to zero and storing the target number of replicas in some annotation.

You can also integrate Pods. The suspend concept can be implemented using Scheduling gates. But pods cannot be restarted, so once preempted the pod is failed and you have to re-submit.

lizzzcai commented 1 year ago

Hi @alculquicondor , thanks for your reply and explanation, I will look into it.

alculquicondor commented 10 months ago

Hello @lizzzcai, if you looked into this, would you be interested in contributing the controller, as part of #1088 ?

lizzzcai commented 10 months ago

Hello @lizzzcai, if you looked into this, would you be interested in contributing the controller, as part of #1088 ?

Hi @alculquicondor, sorry for the late reply. I probably can look into it in my free time but I can not provide a timeline here.

k8s-triage-robot commented 5 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 4 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

alculquicondor commented 4 months ago

/lifecycle frozen

alculquicondor commented 4 weeks ago

/remove-kind support /kind feature