Many low prioirty jobs can block a high-priority job with "resource in cluster is overused"

mattwelborn commented 2 months ago

Description

I have two types of jobs:

Low-priority jobs. This job uses a priority class with priority 10 and a queue with weight 1. This job is set to tolerate only node 1. (Using either node affinity or taints.)
High-priority jobs. This job uses a priority class with priority 100 and a queue with weight 100. This job is set to tolerate only node 2. (Using either node affinity or taints.)

When I submit many low-priority jobs and then submit one high-priority job, the high-priority job does not run. It does not create pods. Its podgroup has an event which says "resource in cluster is overused".

Steps to reproduce the issue

Create a k8s cluster with two nodes: node-1 and node-2.
Create two priority classes:
1. pc-low has priority 10.
2. pc-high has priority 100.
Create two queues:
1. q-low has weight 1.
2. q-high has weight 100.
Create a low priority job definition job-low which uses pc-low, q-low, can only be scheduled onto node-1, and which sleeps for 5 minutes.
Create a high priority job definition job-high which uses pc-high, q-high, can only be scheduled onto node-2, and which sleeps forever.
Submit 1000 copies of job-low.
Submit 1 copy of job-high.

Describe the results you received and expected

Results received: many pods are created for the copies of job-low until there are ~80 pending pods corresponding to job-low. These copies of job-low run on node-1, while node-2 remains idle. The one copy of job-high creates a podgroup, but the podgroup fails to create a pod. Instead, it repeatedly gives the error "resource in cluster is overused". Even as the jobs corresponding to job-low complete, job-high does not run.

Expected results: the copies of job-low should run in order of submission on node-1. (This happens.) The one copy of job-high should immediately run on node-2. (This does not happen.)

What version of Volcano are you using?

v1.9.0

Any other relevant information

I believe that my Volcano scheduler config is the default:

  volcano-scheduler.conf: |
    actions: "enqueue, allocate, backfill"
    tiers:
    - plugins:
      - name: priority
      - name: gang
        enablePreemptable: false
      - name: conformance
    - plugins:
      - name: overcommit
      - name: drf
        enablePreemptable: false
      - name: predicates
      - name: proportion
      - name: nodeorder
      - name: binpack

Monokaix commented 2 months ago

The podgroup of higher job didn't create pods because you have enabled the enqueue action, refer to https://github.com/volcano-sh/volcano/blob/master/docs/design/delay-pod-creation.md for more details.

enqeue action will block pods to enqueue when cluster resources not enough, you can remove it and try it anoher time: )

Monokaix commented 1 month ago

/close

volcano-sh-bot commented 1 month ago

@Monokaix: Closing this issue.

In response to [this](https://github.com/volcano-sh/volcano/issues/3670#issuecomment-2324595661): >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

volcano-sh / volcano