500 Unschedulable Jobs - Githubissues

Description If there are 500 unschedulable jobs in Scale with high priority, anything with lower priority will not be able to be scheduled. The scheduler grabs 500 jobs (QUEUE_LIMIT) and attempts to schedule them. If none are able to be scheduled, it does nothing and returns, only to get those same 500 jobs again the next time it runs.

Reproduction Steps Steps to reproduce the problem:

Create a job type that requests a gpu
Have a cluster where only one node has a gpu resource and have something other than scale use that gpu. Scale should see that the gpu will theoretically be available at some point.
Create 500 jobs of the gpu job type
Create other jobs with lower priority that easily fit on nodes and some with higher priority
The low priority jobs will never run despite space being available on nodes

Expected behavior Scale should wait to return until it has scheduled 500 jobs or skip some jobs if they have not been schedulable for awhile. The unschedulable jobs should not count towards the limit of 500 jobs.

ngageoint / scale

500 Unschedulable Jobs #1801