madecoste / swarming

Automatically exported from code.google.com/p/swarming
Apache License 2.0
0 stars 1 forks source link

Schedule tasks according to device affinity #144

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Goal:
Create locality for tasks of similar type to improve task efficiency and 
throughput.

Use cases:
For example tests may require a specific firmware version and switching the 
firmware on the device take a non-trivial amount of time. It's better to assign 
tasks to the bots representing devices already in a particular state. In this 
case, a single string would likely be enough to describe the bot's state to 
assign its affinity.

Similarly for isolated testing, when >10k >500mb files need to be mapped, the 
latency can become higher than the actual task when the task is not triggered 
often and the pool of bots is large. So it's better to trigger these tasks on a 
bot that had already run this test. In practice it's too slow for the server to 
count the exact hit rate of 10k files on every bot task poll, so an heuristic 
has to be used. In this case, the bot could list a few strings describing the 
previous tasks that were run, which is an heuristic to implicitly describe the 
cache content.

Implementation:
Using 'tags' matching would likely be the fastest implementation; the server 
only understand tags and affinity is calculated from the "distance of tags". 
Tags will be implemented as part of issue 123. That's the task request part of 
tags, for example describing the task name (e.g. browser_tests, 
base_unittests). The thing is that only a few select tags will be useful for 
affinity and historical may (run_isolated cache hit rate) or may not (firmware 
on the device) be useful. The implementation needs to support both use case 
efficiently.

In practice, this is tricky to implement at the polling time, because it means 
polling for a task may mean not handing the task out to a bot because another 
bot is known to be more apt to run the task. The server has to "guess" that the 
other more affine bot will poll soon.

Also behavior in 100% utilization needs to be clearly stated, when given an 
higher priority task that is not affine but a lower affine priority task, which 
one should be selected? It's easy to get the task scheduler overly complicated 
in that case and the search space needs to stay linear for DB operation 
efficiency reason.

Original issue reported on code.google.com by maruel@chromium.org on 15 Aug 2014 at 4:24