Open googs1025 opened 2 months ago
/kind feature
"alpha.jobset.sigs.k8s.io/exclusive-topology" This label can be used to distinguish certain node labels, for example, certain nodes have GPUs or certain nodes have specific network partitions, etc.
To clarify, exclusive job placement per topology means each Job's pods will be colocated within a group of nodes with the same value for the given node label (e.g., all nodes with the label "cloud.google.com/gke-nodepool=my-nodepool").
Once a pod from a given Job has landed on a node, no other Job's pods will be allowed to land on nodes with the label "cloud.google.com/gke-nodepool=my-nodepool" - the first Job has exclusive usage of them.
In practice, we often use a single label to mark different areas or to distinguish different businesses to form a node pool. For example: node-group=group1, node-group=group2, etc. When I tested this method, I found that the existing alpha.jobset.sigs.k8s.io/exclusive-topology could not meet this scenario.
I'm not sure what you mean here. If we have node pools where each pools nodes are grouped via node labels (e.g. cloud.google.com/gke-nodepool=A
, cloud.google.com/gke-nodepool=B
, etc) then exclusive job placement per node pool via specifying alpha.jobset.sigs.k8s.io/exclusive-placement=cloud.google.com/gke-nodepool
is supported and well-tested.
What would you like to be added:
Why is this needed: As far as I know, JobSet implements the topology domain scheduling function. However, after testing, I found that it distinguishes whether there is a node label. For example, if there is a "node-group" label on the node label, JobSet can be scheduled, and if there is no "node-group" label, JobSet cannot be scheduled.
In practice, we often use a single label to mark different areas or to distinguish different businesses to form a node pool. For example: node-group=group1, node-group=group2, etc. When I tested this method, I found that the existing alpha.jobset.sigs.k8s.io/exclusive-topology could not meet this scenario. Do we need to consider this scenario?
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.