Goal:
Reduce perceived latency when running a task, to get results faster than the
time to both prepare the local cache before assigning it a task AND cleanup the
bot/device after the task.
Use case:
When a bot ran a task, the bot state could be "dirty". There's two way to
address this:
- Upon startup of the task, always cleanup the state.
- Upon completion of the task, cleanup the state.
Doing it on completion doesn't work in Real Life(tm) because task fails,
hardware spontaneously reboot, etc, so there's always going to be a moment
where a task is distributed to a bot that is "dirty".
Examples:
- For a Swarming bot that build a project, a checkout at a recent revision
could be done first.
- For a bot representing a device, it could be flashed to a known good revision
when idle, stale files found on the device could be removed, etc.
- The run_isolated local cache could be warmed up with files usually required
to run frequent tests.
Implementation:
Under a scenario of <100% utilization, the bots should be stateful on the
server. A state machine in the server would register their state and would try
to distribute "background work tasks" that would "cleanup" the bot in advance,
in preparation to have the bot to execute "tasks that are expected in the
future".
This is expected to work in tandem with issue 144 (device affinity) so that not
all devices become in a single state. Instead, multiple states would stabilize
automatically based on the utilization.
This requires multiple things:
- Describe pre-warm tasks.
- Have the client part to describe the task, archive it on the isolate server and ensure the files are pinned on the isolate server.
- Have the server part to describe these background task.
- Run background tasks.
- Have task_to_run.py assign background task when <100%.
This must be pretty conservative; the tasks must not become a bottleneck when a
surge of real tasks incoming happen, especially in the case where background
tasks are slow (for example checking out the chromium source tree). In
addition, this should respect device affinity, and assign new affinity. It must
not start to prewarm for an old version of a firmware when all the bots start
to use a newer firmware, etc.
Original issue reported on code.google.com by maruel@chromium.org on 15 Aug 2014 at 4:47
Original issue reported on code.google.com by
maruel@chromium.org
on 15 Aug 2014 at 4:47