Critical situations can be defined as:
- X pending tasks globally or for a specific dimensions.
- Presence of pending task enqueued for more than X minutes. Could be
categorized by priority or tags; e.g. high priority task pending is an issue.
- X bots offline (likely in percentage), also notify per dimensions.
- Abnormal number of BOT_DIED
- Abnormal number of task expiration
- Abnormal number of execution timeouts
- Abnormal number of task failure
[Add more]
Original issue reported on code.google.com by maruel@chromium.org on 28 Aug 2014 at 2:32
Original issue reported on code.google.com by
maruel@chromium.org
on 28 Aug 2014 at 2:32