volcano-sh / volcano

A Cloud Native Batch System (Project under CNCF)
https://volcano.sh
Apache License 2.0
4.12k stars 952 forks source link

Preempt do not follow minAvailable. #2627

Open jiangkaihua opened 1 year ago

jiangkaihua commented 1 year ago

What happened:

When I tried preempt action with priority plugin. I set minAvailable: 1 in high-priority job which has 4 tasks, and supposed that Volcano would evict only 1 task of low-priority job. But the result was that all 4 tasks from high-priority job started and evicted 4 low-priority tasks as below:

NAME               READY   STATUS        RESTARTS   AGE
job-high-task0-0   0/1     Pending       0          3s
job-high-task0-1   0/1     Pending       0          3s
job-high-task1-0   0/1     Pending       0          3s
job-high-task1-1   0/1     Pending       0          3s
job-low-task0-0    1/1     Terminating   0          13m
job-low-task0-1    1/1     Terminating   0          13m
job-low-task1-0    1/1     Terminating   0          13m
job-low-task1-1    1/1     Terminating   0          14m

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

volcano-scheduler-configmap was set as below:

    actions: "enqueue, allocate, preempt, backfill"
    tiers:
    - plugins:
      - name: priority
      - name: gang
        enableJobStarving: false
        enablePreemptable: false
      - name: conformance
    - plugins:
      - name: drf
        enablePreemptable: false
      - name: predicates
      - name: nodeorder
      - name: binpack
  1. Set priorityclass:
NAME                      VALUE        GLOBAL-DEFAULT   AGE
high                      1000         false            49d
low                       10           false            31h
  1. Apply low-priority job with 4 task and fullfilled the cluster:
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: job-low
spec:
  schedulerName: volcano
  minAvailable: 4
  queue: default
  priorityClassName: low
  tasks:
    - replicas: 2
      name: "task0"
      template:
        spec:
          containers:
            - image: alpine
              command: ["/bin/sh", "-c", "sleep 99999"]
              imagePullPolicy: IfNotPresent
              name: task
              resources:
                limits:
                  cpu: 3750m
                requests:
                  cpu: 3750m
          restartPolicy: OnFailure
    - replicas: 2
      name: "task1"
      template:
        spec:
          containers:
            - image: alpine
              command: ["/bin/sh", "-c", "sleep 99999"]
              imagePullPolicy: IfNotPresent
              name: task
              resources:
                limits:
                  cpu: 3750m
                requests:
                  cpu: 3750m
          restartPolicy: OnFailure
  1. Apply high-priority job with 4 task but minAvailable: 1:
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: job-high
spec:
  schedulerName: volcano
  minAvailable: 1
  queue: default
  priorityClassName: high
  tasks:
    - replicas: 2
      name: "task0"
      template:
        spec:
          containers:
            - image: alpine
              command: ["/bin/sh", "-c", "sleep 99999"]
              imagePullPolicy: IfNotPresent
              name: task
              resources:
                limits:
                  cpu: 3750m
                requests:
                  cpu: 3750m
          restartPolicy: OnFailure
    - replicas: 2
      name: "task1"
      template:
        spec:
          containers:
            - image: alpine
              command: ["/bin/sh", "-c", "sleep 99999"]
              imagePullPolicy: IfNotPresent
              name: task
              resources:
                limits:
                  cpu: 3750m
                requests:
                  cpu: 3750m
          restartPolicy: OnFailure
  1. Wait for preempt.

Anything else we need to know?:

Environment:

jiangkaihua commented 1 year ago

This issue was probably caused by: https://github.com/volcano-sh/volcano/blob/1e473e0cf5df9f09ab70cc616a56e6eea8f25c3b/pkg/scheduler/actions/preempt/preempt.go#L78-L80

After JobStarving() judge that a job could be preemptor and evict other jobs, preempt action added all pending tasks of this job into preemptor queue but not its minAvailable tasks. Causing that all tasks would preempt resources from other jobs.

vishal-chdhry commented 1 year ago

hi! may i pick this one?

jiangkaihua commented 1 year ago

hi! may i pick this one?

Certainly! Thank you for contribution. : )

This fix may be a little difficult, since it involves multiple plugin. For example, if gang plugin was registered in volcano-scheduler-configmap, here we need to check MinAvailable and TaskMinAvailable field in preemptor job to make sure that only MinAvailable tasks containing TaskMinAvailable number of each kind of tasks was pushed into preemptorTasks. Similarily, if priority plugin was registered, we need to take task.priority into account.

So if you do not mind, please write a doc about this design. So that our members cloud discuss about it in detail. : )

jiangkaihua commented 1 year ago

My suggestion is to modify jobStarvingFn(). Currently it returned bool to check whether a job should be a preemptor or not, but if this interface returned a map of tasks from this job, we could add this task map into preemptorTasks directly. And if the task map is nil, it means that plugin rejected this job to be preemptor.

By the way, currently jobStarvingFn() interface was registered by gang, priority, and tdm plugin. They may be exclusive and not available in same tiers. The behavior need to be discussed in detail.

william-wang commented 1 year ago

/assign @Vishal-Chdhry

wangyang0616 commented 1 year ago

https://github.com/volcano-sh/volcano/blob/f09fb542f43ee8cc940406c26924d8ac35e468b3/pkg/scheduler/actions/preempt/preempt.go#L101-L104

When the preemption action is executed, the jobStarvingFn interface will be judged to determine whether the current job resource meets the minAvailble requirement, and if so, the preemption will stop.

Is it possible to solve the current problem by deleting enableJobStarving: false in configmap?

- name: gang
    enableJobStarving: false
    enablePreemptable: false

However, whether the performance of the current implementation method is optimal, and whether there is a more reasonable implementation method, can be discussed again.

vishal-chdhry commented 1 year ago

i think this is beyond my skill level as of now, is there any other issue i can work on that is easier to get started with?

wangyang0616 commented 1 year ago

i think this is beyond my skill level as of now, is there any other issue i can work on that is easier to get started with?

Welcome to join the community, now we are planning the features of Volcano 1.8.0, you can follow the community dynamics, choose the features you are interested in and issue development

stale[bot] commented 1 year ago

Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).

wangyang0616 commented 1 year ago

I think it's necessary to keep

stale[bot] commented 11 months ago

Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).