ocurrent / ocluster

Distribute CI builds to worker nodes over Cap'n Proto
Apache License 2.0
34 stars 18 forks source link

Improve scheduling algorithm #249

Open shonfeder opened 1 week ago

shonfeder commented 1 week ago

We often see the current scheduling to result in one worker getting overloaded with a massive queue due to cache hints while other workers are starved for work.

We want the job scheduling algorithm to take cache hints into account, but also to consider worker capacity and availability. There are times where this should be able to really improve our thru put.

Related to, but more general than #168

talex5 commented 1 week ago

We want the job scheduling algorithm to take cache hints into account, but also to consider worker capacity and availability.

It should already be doing that:

https://github.com/ocurrent/ocluster/blob/a27ef61876471fa5620d2422fef441bf50d0260f/scheduler/pool.ml#L368-L373

However, there are a few problems (or were, last time I looked):