kubernetes-sigs / kueue

Kubernetes-native Job Queueing
https://kueue.sigs.k8s.io
Apache License 2.0
1.36k stars 244 forks source link

Job should list "Untolerated Taint" as reason for not being admitted #3158

Open nfung-soundhound opened 2 weeks ago

nfung-soundhound commented 2 weeks ago

What would you like to be added: Pending Jobs/Workloads should be verbose in listing the reasons why they are pending, in particular in the case where it cannot tolerate the taints of any of the ResourceFlavors of the ClusterQueue for which it has submitted to.

Why is this needed: Consider the following example ResourceFlavor with a taint, and job definition below. Assume the dev LocalQueue submits to the dev ClusterQueue. Also assume that the ClusterQueue contains the ResourceFlavor node_type defined below with sufficient quotas.

apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
  name: node_type
spec:
  nodeLabels:
    beta.kubernetes.io/instance-type: "node_type"
  nodeTaints:
  - key: taint_key
    value: "taint_value"
    effect: NoSchedule
# job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: myjob
  labels:
    kueue.x-k8s.io/queue-name: dev
spec:
  suspend: true
  template:
    spec:
      restartPolicy: Never
      containers:
      - name: myapp
        image: busybox
        command: ["sleep", "500"]
        resources:
          requests:
            memory: "128Mi"
            cpu: "500m"
          limits:
            memory: "128Mi"
            cpu: "500m"

When submitting the job above, the controller will suspend the job. However, the job events yield very little information:

Events:
  Type    Reason           Age   From                        Message
  ----    ------           ----  ----                        -------
  Normal  Suspended        10m   job-controller              Job suspended
  Normal  CreatedWorkload  10m   batch/job-kueue-controller  Created Workload: default/job-myjob-d2369

After enabling the debug logs on the controller, it turns out the job was not scheduled because it couldn't tolerate the taints for that node type. This might be fine for an administrator, but this makes it not user friendly for developers, where they might accidentally miss a taint. Typically, when scheduling pods/jobs, if it's not schedulable kubernetes provides the fact that it can't tolerate certain taints on some nodes.

{"level":"debug",
 "ts":"2024-09-27T19:56:20.485176276Z",
 "logger":"events","caller":"recorder/recorder.go:104","msg":"couldn't assign flavors to pod set main: untolerated taint {taint_key taint_value NoSchedule <nil>} in flavor node_type","type":"Normal","object":{"kind":"Workload","namespace":"default","name":"job-myjob-d2369","uid":"f469c8b0-a23b-4854-bc05-048a38904520","apiVersion":"kueue.x-k8s.io/v1beta1","resourceVersion":"654137093"},"reason":"Pending"}

My current workaround is to no longer use taints on the ResourceFlavors, instead relying on the taints on the nodes themselves, and not using any tolerations on the ResourceFlavors. This has the unintended side effect of reserving a portion of the Quota without actually running a workload (i.e., the job will be submitted, but the pods will be stuck in pending since they do not tolerate the taints.)

I have just begun to use Kueue, so please suggest any workarounds (I've thought of but not tested all-or nothing scheduling in this instance).

Completion requirements:

TBD, but would require some changes to the controller that give pending jobs/workloads reasons why they cannot be scheduled when sufficient quotas exist.

tenzen-y commented 5 days ago

This ResourceFlavor behavior is expected as described in https://kueue.sigs.k8s.io/docs/concepts/resource_flavor/#resourceflavor-taints.

But, I agree that the current condition is not helpful for the batch users. Adding more informable condition or event would be better in this case.