Closed GoogleCodeExporter closed 9 years ago
This is an important task to increase the reliability to >99.9%. When a bot
died, the task should be retried once.
The code tagging a task as BOT_DIED is bot_kill_task() at
https://code.google.com/p/swarming/source/browse/services/swarming/server/task_s
cheduler.py#411
In particular, we'd want a new TaskRunResult to be created for the second try
https://code.google.com/p/swarming/source/browse/services/swarming/server/task_r
esult.py#364
so that the data for each try is independently saved. The whole design is
already done to support this but the control bits are missing, and some tuning
of the entities may be required. For example,
result_summary_key_to_run_result_key() refuses try_number != 1.
https://code.google.com/p/swarming/source/browse/services/swarming/server/task_r
esult.py#654
See the entity tree at
https://code.google.com/p/swarming/source/browse/services/swarming/server/README
.md
Original comment by maruel@chromium.org
on 6 Aug 2014 at 4:36
Surfacing the results properly is likely blocked on the new client API, issue
118. That said, the overall thing could still work just fine even without the
new client API.
Original comment by maruel@chromium.org
on 6 Aug 2014 at 4:39
Original comment by maruel@chromium.org
on 6 Aug 2014 at 4:40
Original comment by maruel@chromium.org
on 7 Aug 2014 at 1:50
This task includes adding a new .idempotent flag to TaskProperties, to
differentiate tasks that can be safely retried from the ones that have side
effects (like accessing a remote server and setting properties on it).
Original comment by maruel@chromium.org
on 14 Aug 2014 at 9:07
Original comment by maruel@chromium.org
on 18 Sep 2014 at 7:16
Original issue reported on code.google.com by
maruel@chromium.org
on 22 May 2014 at 5:00