volcano-sh / volcano

A Cloud Native Batch System (Project under CNCF)
https://volcano.sh
Apache License 2.0
4.07k stars 943 forks source link

minAvailable and dependsOn test question #2944

Open renwenlong-github opened 1 year ago

renwenlong-github commented 1 year ago

minAvailable and dependsOn test question

volcano version: master;test resource: 4C+4G

case 1

no_dependson_job.minAvailable<sum(task.minAvailable).yaml

apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: minavailable-job
spec:
  schedulerName: volcano
  minAvailable: 4
  tasks:
    - replicas: 5
      minAvailable: 4
      name: "master"
      template:
        metadata:
          name: master
        spec:
          containers:
            - image: nginx
              name: nginx
              resources:
                requests:
                  cpu: "1"
                  memory: "1Gi"
          restartPolicy: OnFailure
    - replicas: 3
      minAvailable: 1
      name: "work"
      template:
        metadata:
          name: web
        spec:
          containers:
            - image: nginx
              name: nginx
              resources:
                requests:
                  cpu: "1"
                  memory: "1Gi"
          restartPolicy: OnFailure

result

(base)  ~ kubectl get pods
NAME                        READY   STATUS    RESTARTS   AGE
minavailable-job-master-0   1/1     Running   0          103s
minavailable-job-master-1   1/1     Running   0          103s
minavailable-job-master-2   0/1     Pending   0          103s
minavailable-job-master-3   0/1     Pending   0          103s
minavailable-job-master-4   0/1     Pending   0          103s

minavailable-job-work-0     1/1     Running   0          103s
minavailable-job-work-1     1/1     Running   0          103s
minavailable-job-work-2     0/1     Pending   0          103s

analyse

when job.minAvailable<sum(task.minAvailable)=4+1,volcano execute allocate action, CheckTaskReady job.minAvailable<sum(task.minAvailable) return true is inaccurate,The key code is as follows:

// pkg/scheduler/actions/allocate/allocate.go
if ssn.JobReady(job) {
    stmt.Commit()
}

// pkg/scheduler/plugins/gang/gang.go
ssn.AddJobReadyFn(gp.Name(), func(obj interface{}) bool {
    ji := obj.(*api.JobInfo)
    if ji.CheckTaskReady() && ji.Ready() {
        return true
    }
    return false
}

// pkg/scheduler/api/job_info.go
func (ji *JobInfo) CheckTaskReady() bool {
    if ji.MinAvailable < ji.TaskMinAvailableTotal {
        return true
    }
    ...
}

func (ji *JobInfo) CheckTaskStarving() bool {
    if ji.MinAvailable < ji.TaskMinAvailableTotal {
        return true
    }
}

case 2

no_dependson_job.minAvailable=sum(task.minAvailable).yaml

apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: minavailable-job
spec:
  schedulerName: volcano
  minAvailable: 4
  tasks:
    - replicas: 3
      minAvailable: 1
      name: "master"
      template:
        metadata:
          name: master
        spec:
          containers:
            - image: nginx
              name: nginx
              resources:
                requests:
                  cpu: "1"
                  memory: "1Gi"
          restartPolicy: OnFailure
    - replicas: 4
      minAvailable: 3
      name: "work"
      template:
        metadata:
          name: web
        spec:
          containers:
            - image: nginx
              name: nginx
              resources:
                requests:
                  cpu: "1"
                  memory: "1Gi"
          restartPolicy: OnFailure

result

I can run 4 pods locally. This task should be scheduled successfully, but the result is that the scheduling is unsuccessful.

analyse

job.minAvailable=sum(task.minAvailable)=3+1,The reason for the unsuccessful scheduling is that it is not scheduled according to task.minAvailable. When scheduling, tasks need to be sorted, but minAvailable is not considered at present, the sorting results are minavailable-job-master-0, 1 and minavailable-job-work-0, 1, and the scheduling result 2+2 is not equal to 3+1

solution

scheduler consider the minAvailable feature when sorting tasks

case 3

no_dependson_job.minAvailable>sum(task.minAvailable).yaml

apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: minavailable-job
spec:
  schedulerName: volcano
  minAvailable: 5
  tasks:
    - replicas: 5
      minAvailable: 2
      name: "master"
      template:
        metadata:
          name: master
        spec:
          containers:
            - image: nginx
              name: nginx
              resources:
                requests:
                  cpu: "1"
                  memory: "1Gi"
          restartPolicy: OnFailure
    - replicas: 3
      minAvailable: 2
      name: "work"
      template:
        metadata:
          name: web
        spec:
          containers:
            - image: nginx
              name: nginx
              resources:
                requests:
                  cpu: "1"
                  memory: "1Gi"
          restartPolicy: OnFailure

result

(base)  ~/ kubectl get pg -o yaml
apiVersion: scheduling.volcano.sh/v1beta1
kind: PodGroup
spec:
  minMember: 5
  minResources:
    count/pods: "5"
    cpu: "5"
    memory: 5Gi
    pods: "5"
    requests.cpu: "5"
    requests.memory: 5Gi
  minTaskMember:
    master: 2
    work: 1
  queue: default
status:
  conditions:
  - lastTransitionTime: "2023-06-27T03:50:27Z"
    message: '1/8 tasks in gang unschedulable: pod group is not ready, 5 minAvailable,
      8 Pending; Pending: 4 Schedulable, 4 Unschedulable'
    reason: NotEnoughResources
    status: "True"
    transitionID: 234f653c-2937-40ba-b4f3-46cfd9a82ed2
    type: Unschedulable
  phase: Inqueue

analyse

Four pods are scheduled successfully, but 4<5(job.minAvailable) is unsuccessful. Although task.minAvailable is satisfied, we should modify the webhook validate to not allow job.minAvailable>sum(task.minAvailable).

case 4

dependson_job.minAvailable=task.minAvailable.yaml

apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: minavailable-job
#  annotations:
#    dependent-job: "true1"
spec:
  schedulerName: volcano
  minAvailable: 3
  tasks:
    - replicas: 4
      minAvailable: 2
      name: "master"
      template:
        metadata:
          name: master
        spec:
          containers:
            - image: nginx
              name: nginx
              resources:
                requests:
                  cpu: "1"
                  memory: "1Gi"
          restartPolicy: OnFailure
    - replicas: 2
      minAvailable: 1
      name: "work"
      template:
        metadata:
          name: web
        spec:
          containers:
            - image: nginx
              name: nginx
              resources:
                requests:
                  cpu: "1"
                  memory: "1Gi"
          restartPolicy: OnFailure
      dependsOn:
        name:
          - "master"

result

(base)  ~/ kubectl get pg
NAME                                                    STATUS    MINMEMBER   RUNNINGS   AGE
minavailable-job-73974b1c-6440-43ac-8634-42549dcd5252   Inqueue   3                      6m49s
(base)  ~/ kubectl get pods
NAME                        READY   STATUS    RESTARTS   AGE
minavailable-job-master-0   0/1     Pending   0          6m56s
minavailable-job-master-1   0/1     Pending   0          6m56s
minavailable-job-master-2   0/1     Pending   0          6m56s
minavailable-job-master-3   0/1     Pending   0          6m56s

(base)  ~/ kubectl describe  vj minavailable-job
Events:
Type     Reason           Age    From                   Message
----     ------           ----   ----                   -------
Warning  PodGroupPending  8m15s  vc-controller-manager  PodGroup default:minavailable-job unschedule,reason: 3/0 tasks in gang unschedulable: pod group is not ready, 3 minAvailable

# Modify the scheduler log level v=4, found the following log
I0627 02:43:02.974640  1 allocate.go:75] Job <default/minavailable-job-73974b1c-6440-43ac-8634-42549dcd5252> Queue <default> skip allocate, reason: NotEnoughPodsOfTask, message Not enough valid pods of each task for gang-scheduling
I0627 02:43:03.979119   1 cache.go:879] task unscheduleable default/minavailable-job-master-2, message: 3/4 tasks in gang unschedulable: pod group is not ready, 3 minAvailable, 4 Pending; Pending: 4 Unschedulable, skip by no condition update

analyse

// pkg/scheduler/actions/allocate/allocate.go/74
// ssn.JobValid(job) -> CheckTaskValid()
// pkg/scheduler/api/job_info.go/724
func (ji *JobInfo) CheckTaskValid() bool {
  for task, minAvailable := range ji.TaskMinAvailable {
    // Scheduling checks whether the pods in the cluster meet the minAvailable of the current task, if not, returns false, 
    // dependsOn tasks do not create pods, returns flase 
    if act, ok := actual[task]; !ok || act < minAvailable {
       return false
    }
  }
}

solution

The solution is to mark the dependsOn job. If the dependsOn job task does not create a pod, skip the check.

Anything else we need to know?: test resource 4C/4G

Environment:

stale[bot] commented 11 months ago

Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).