kubernetes-sigs / kueue

Kubernetes-native Job Queueing
https://kueue.sigs.k8s.io
Apache License 2.0
1.3k stars 228 forks source link

Backfilling when provisioningrequest adds nodes to the cluster #1871

Closed asm582 closed 1 day ago

asm582 commented 5 months ago

What would you like to be added: A mechanism to submit queued jobs/workloads on the newly added available nodes in the cluster while the scale up completes for the previous job via provisioningrequest.

Why is this needed: We block quota and later acquire resources in some what sequential fashion from cloud provider, not using newly added resources for queued jobs will lead to resource wastage and decrease in cluster throughput.

Completion requirements: New preemption policy may be needed to force terminate already existing running backfilled jobs on the newly acquired nodes.

This enhancement requires the following artifacts:

The artifacts should be linked in subsequent comments.

k8s-triage-robot commented 2 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

alculquicondor commented 2 months ago

I think a better approach is if the cloud provider gives some form of atomicity when doing scale ups.

What you are proposing sounds very problematic. How do you express that a job is ok to run in nodes provisioned for some other job? When/how to do preemption?

Do you have a more specific proposal?

k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 1 day ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 1 day ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/kueue/issues/1871#issuecomment-2308556327): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.