Closed trasc closed 1 month ago
FYI, we don't need a new KEP, but you can add the details to the existing one.
Hi all, what's left here, our team is interested with the integration with rayJob ?
IIRC, we don't support partial admission on RayJob, now.
So, we need to implement minPodsCount
and then modify functions for RayJob based on minPodCount
like this:
Thanks @tenzen-y for the feedbacks. cc @BinL233
@kerthcet can you share how heterogeneous your Ray jobs are?
I wonder if we can simplify support for partial admission by restricting it to one podset. Otherwise it's an NP problem.
We're still exploring this, but we found the rayCluster's autoscaling is complex, and maybe that's out of the scope of kueue but related to cluster-autoscaler. It's recommended by the ray community as 1 pod(raynode) : 1 node.
Some phenomenons like when we don't have enough resources for autoscaling, the rayjob will hang forever, although part of its tasks finished, the resources will not be reclaimed. Then I think kueue can do little here ..
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
cc @astefanutti @vicentefb @andrewsykim in case you have interest on this.
Partial admission is different from elastic in that, during admission, Kueue decides to give a smaller size to the RayJob and the job runs like this until it completes.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
What would you like to be added:
Add support for partial admission for RayJobs. Check #420 and https://github.com/kubernetes-sigs/kueue/pull/667/files#r1198519116 for detail.
Why is this needed:
Completion requirements:
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.