ubccpsc310 / classy

Course management and automation.
MIT License
2 stars 3 forks source link

Improve Queue for high-volume workloads #102

Closed rtholmes closed 11 months ago

rtholmes commented 1 year ago

This term in 310 we ended up with 2500+ jobs on the queue. The current queue prioritizes higher queues with the current max jobs:

Express: 1
Standard: 2
Low: 50

Because you can't remove a job once it is queued, some student requests were getting de-prioritized to the low queue (e.g., if they had a check request on express and standard was full). This meant the bot would respond very quickly sometimes but very slowly for others.

Refactor the queue to support the following:

  1. The express queue should receive all student requests so they are handled first.
  2. The standard queue should be the first stop for push events. This means students with few pushes will be prioritized above students with dozens.
  3. The standard queue should also get all high-priority requests, as defined by the course plugin, regardless of the length imposed on the standard queue.
  4. When new push events are received, they should take the place of any prior pushes on the standard queue so their place in line is not lost and students can feedback quicker on their more recent work. The old jobs should just be moved to the low queue.
  5. The low queue should give up on trying to schedule everything. 2% of the repos are responsible for 80% of the pushes. The low queue should keep the most recent N pushes queued for analysis. If we want to run prior ones that were removed they can be run by request (e.g., through the express queue). By keeping the most recent N, we will always run the commits closest to the deadline because they will be last.

This means the new queues will contain:

Express: As many as students are allowed to request by rate limiting (probably 2 in practice)
Standard: most recent 2, plus all prioritized pushes
Low: most recent 20 not already on some other queue
rtholmes commented 1 year ago

See 660a4ac and b8df8e8

rtholmes commented 11 months ago

This is completed and has run for several deliverables. Perf went way up using the cooldown to moderate the express queue and letting that jump ahead of all pushes.