luci / luci-py

LUCI in python
Apache License 2.0
78 stars 35 forks source link

Create the concept of bot 'tags' similar to task tags. #293

Open maruel opened 7 years ago

maruel commented 7 years ago

Background

Tasks have a concept of tags. While dimensions are meant to help with task->bot matching, tags are for management purposes. Tags are leveraged by /tasklist to provide efficient searchability and accounting mechanism. While dimensions are the what, tags are the why. The concept of task tags is relatively recent (2 years) but it is fully leveraged now.

At the present, bots can be searched for via their dimensions (the what) and a few special hardcoded special metadata like quarantined, is_busy and is_dead. See BotsRequest at appengine/swarming/swarming_rpcs.py#L413.

This is a significant limitation. Tasks were done first because Chrome manages around half a million tasks per day but only has a few thousands of bots, which meant that continuously enumerating all bots for each query was relatively acceptable but it is increasingly not. As the number of bots increases, manageability of bots becomes increasingly important.

There's no other way to search for bots without making this a dimension at the moment. Bots can expose state but it is intentionally fully unstructured data. This resulted in a fair amount of hardcoded logic in the Polymer UI; for example mp_lease_id https://github.com/luci/luci-py/blob/master/appengine/swarming/ui/res/imp/botlist/bot-list.html#L366

To help with the explosion of metadata, _BotCommon.composite was added but it is inherently a hack and not extensible.

Goal

Make bots more consistent to tasks by using the same concepts of dimensions and tags. Transition the relevant subset of key:value style items from state into tags.

Similar to tasks, the server reserves the right add arbitrary tags.

This is to increase accountability of the bots, to expose more properties without necessarily making these attributes selectable for task selection. For example, a task shall not have the ability to select a Machine Provider managed VM or not. On the other hand, it is totally sensible for an administrator to search for all MP managed bots.

Action Items

Addition

Cleanup

kjlubick commented 7 years ago

For tasks, the tags are basically a superset of the dimensions, and we always search by dimensions. Would this be the same for bots, i.e. you can only search by bot.tags? Or would we keep the search by dimensions and add a second api to search by tags?

I lean towards the former, for consistency and avoiding duplicate apis. In any case, I don't think a mixed search between dimensions and tags should be allowed.

I think is_dead and quarantined could become tags, as those are semi-permanent things.

I don't think is_busy should be a tag, as it is very transient, but I could be swayed on the matter.