luci / luci-py

LUCI in python
Apache License 2.0
80 stars 36 forks source link

Add optional dimension filtering in task request, e.g. allow a dimension to be a list instead of a single value #143

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Goal:
Guarantee a minimum throughput for each consumer teams/project by assigning 
dedicated pools to subteams by dedicating bots.

Use case:
Let's say there's dedicated pool of 10 devices for team A, another pool of 10 
for team B, and a 40 shared pool usable by both team A, B and C. Teams A and B 
are guaranteed basic throughput and can each expand on the shared pool of 40 
bots. Team C is "free riding" the shared pool. This is orthogonal to task 
priority, so that each team can stomp on each other with higher priority task 
as needed.

The way to specify that is that you state an optional dimension:
  ./swarming.py trigger ... -d lab sharedpool -d lab teamApool

It would allow a device assigned to the team A dedicated pool or one in the 
shared tool, whatever is available first. Note that priorities are still taken 
in account, it just enables more seamless sharing of different pools to get 
dedicated throughput.

Implementation:
https://code.google.com/p/swarming/source/browse/services/swarming/server/task_t
o_run.py#45
TaskToRun is the key hash. What we'd want is the hash to be a list. The way to 
do this is to add a new property:
hashes = ndb.IntegerProperty(repeated=True, indexed=False)
so that the loop at 
https://code.google.com/p/swarming/source/browse/services/swarming/server/task_t
o_run.py#345
would do instead:
if not any(h in accepted_dimensions_hash for h in task_key.hashes):

The change can be done live by enabling the store of the new property in 
TaskToRun first, waiting for all pending tasks since the instance switch over 
to be done, then switching yield_next_available_task_to_dispatch() to use the 
new logic and stop using the entity key.

Non-use case:
[This is already supported and listed for demonstrative purposes]
A select number of bots have a specific configurations. Let's say they are 
equipped for performance tests that require a sensor to be associated with it. 
The layout would be as follow:
perf bots would have dimension depending on their physical setup:
- {'temperature_sensor': '1', 'camera': '0'}
- {'temperature_sensor': '0', 'camera': '1'} 
- {'temperature_sensor': '1', ''camera': '1'}
normal bots would have dimension {'temperature_sensor': '0', 'camera': '0'}

For use case where a camera recording the screen is needed, the request could 
be:
{'camera': '1'}
so that bots with or without 'temperature_sensor': '1' would be used.

But bots with a temperature sensor are rare and will likely be starved! The 
right way to handle this use case is to use higher priority for tasks that 
requires limited throughput bots. This is why closely tuning priorities is 
important.

Original issue reported on code.google.com by maruel@chromium.org on 15 Aug 2014 at 4:04

kenrussell commented 8 years ago

Another use case for this feature:

The GPU bots in Chromium's Swarming pool are all quite homogeneous; for example, the "Linux NVIDIA" configuration all has exactly the same type of NVIDIA GPU.

At some point it will almost surely be necessary to switch out the GPUs in one of these configurations.

In order to transition smoothly, it will be necessary to upgrade some or all of these machines in-place. This means that for a brief period of time, we will want jobs for "Linux NVIDIA" to go to either the GPU with PCI ID 0xAAAA or ID 0xBBBB.

Right now the swarming tags only allow specification of the gpu dimension as "10de:aaaa", for example. We'd like to be able to say for these jobs "--dimension gpu (10de:aaaa, 10de:bbbb)" or equivalent.

ghost commented 8 years ago

FWIW, this can be solved by adding a common dimension to both pools, e.g. gpu:linux_nvidia, and requesting that in the tasks. This is similar to how e.g. os:Mac is common to all specific os dimensions.

On Tue, Apr 19, 2016 at 2:12 PM Ken Russell notifications@github.com wrote:

Another use case for this feature:

The GPU bots in Chromium's Swarming pool are all quite homogeneous; for example, the "Linux NVIDIA" configuration all has exactly the same type of NVIDIA GPU.

At some point it will almost surely be necessary to switch out the GPUs in one of these configurations.

In order to transition smoothly, it will be necessary to upgrade some or all of these machines in-place. This means that for a brief period of time, we will want jobs for "Linux NVIDIA" to go to either the GPU with PCI ID 0xAAAA or ID 0xBBBB.

Right now the swarming tags only allow specification of the gpu dimension as "10de:aaaa", for example. We'd like to be able to say for these jobs "--dimension gpu (10de:aaaa, 10de:bbbb)" or equivalent.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/luci/luci-py/issues/143#issuecomment-212131963

nodirt commented 8 years ago

I'd prefer that too because.

Just added a check to swarmbucket that dimension keys are unique

On Wed, Apr 20, 2016, 3:43 PM sergeyberezin notifications@github.com wrote:

FWIW, this can be solved by adding a common dimension to both pools, e.g. gpu:linux_nvidia, and requesting that in the tasks. This is similar to how e.g. os:Mac is common to all specific os dimensions.

On Tue, Apr 19, 2016 at 2:12 PM Ken Russell notifications@github.com wrote:

Another use case for this feature:

The GPU bots in Chromium's Swarming pool are all quite homogeneous; for example, the "Linux NVIDIA" configuration all has exactly the same type of NVIDIA GPU.

At some point it will almost surely be necessary to switch out the GPUs in one of these configurations.

In order to transition smoothly, it will be necessary to upgrade some or all of these machines in-place. This means that for a brief period of time, we will want jobs for "Linux NVIDIA" to go to either the GPU with PCI ID 0xAAAA or ID 0xBBBB.

Right now the swarming tags only allow specification of the gpu dimension as "10de:aaaa", for example. We'd like to be able to say for these jobs "--dimension gpu (10de:aaaa, 10de:bbbb)" or equivalent.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/luci/luci-py/issues/143#issuecomment-212131963

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/luci/luci-py/issues/143#issuecomment-212641263

kenrussell commented 8 years ago

Thanks for the suggestions. I just found that the GPU vendor irrespective of GPU device is already there, so "--dimension gpu 10de" already works. Never mind my particular use case, since it's already supported.