oxidecomputer / buildomat

a software build labour-saving device
Mozilla Public License 2.0
53 stars 2 forks source link

jobs could request additional managed resources #51

Open jclulow opened 5 months ago

jclulow commented 5 months ago

Sometimes a job requires additional resources beyond what can be created within a specific target environment. For example, a job may require access to create and destroy resources on an Oxide Rack.

The Oxide rack requires an authentication token (representing a user with particular permissions) to use. It allows resources to be isolated inside various containers, like silos or projects. We would like to provide a pristine user account to a job, with a token that it is (relatively) safe to leak, and to clean up any mess made by the job afterwards.

In the buildomat model, the system itself creates any tokens that are required, rather than providing a generic store for "secure" strings. These tokens are created with the minimal required permissions and in a way where their use is bounded to roughly the execution period of the job itself.

Today, the execution environment for a job (called a worker) is created by a factory, based on the target requested by the job; e.g., helios-2.0 or ubuntu-22.04. The factory is responsible for creating and tearing down the computing resources (e.g., VMs, or network booted physical hosts) required to provide these environments. The core buildomat server keeps track of each worker across its life cycle, so that we don't drop any resources until they are fully cleaned up.

Rather than create more targets that happen to provide resources like a token to use the Oxide rack, we should instead create a new orthogonal concept: the resource. A resource would be notionally similar to a worker in many respects:

It will be important to consider the way resources will be acquired by waiting jobs, to avoid deadlock. Probably something like this policy would suffice:

If we were to allow parallel acquisition of resources and workers, it would seem pretty easy to end up with a scenario like: