volcano-sh / volcano

A Cloud Native Batch System (Project under CNCF)
https://volcano.sh
Apache License 2.0
4.11k stars 949 forks source link

two jobs of different queue contend for more resource than weight will get into lock status #3725

Open cocodee opened 2 weeks ago

cocodee commented 2 weeks ago

Description

there are two queues: A ,B and the cluster have 10GPUs, and both A and B have a deserved with 5GPUs and Capability with 10GPUs. Now, user submit jobs that use 6GPUs to queue A. After A begins running. user then submit not preemptable tasks to queue A and use up 6GPUs,then sub mit another task to queue B and use 6GPUs. Then ,there willl be one job running and two job in queue. Then delete the running job. then another two jobs will stay in pending status for ever

Steps to reproduce the issue

Scenario Description Queue Configuration:

Queue A and B each have a quota (deserved) of 5 GPUs, but each queue has a capacity (capability) of 10 GPUs.

Job Submission:

A user submits a job that uses 6 GPUs to Queue A.

Queue A starts running this job.

The user submits two non-preemptable tasks to Queue A, each using 6 GPUs.

The user submits another job that uses 6 GPUs to Queue B.

Job Status:

One job is running, and two jobs are waiting in the queue.

The running job is deleted.

The two jobs remain in a pending state.

Describe the results you received and expected

The two pending jobs could be scheduled.

What version of Volcano are you using?

v1.9.0

Any other relevant information

No response

lowang-bh commented 2 weeks ago

please paste your scheduler configmap, and your job/queue's yaml.