oxidecomputer / propolis

VMM userspace for illumos bhyve
Mozilla Public License 2.0
178 stars 22 forks source link

two instance ensure APIs is one too many #804

Open gjcolombo opened 6 days ago

gjcolombo commented 6 days ago

Today the server API offers two distinct ways to initialize a new VM:

In both cases, the server's first task is to try to convert its ensure parameters into a server-internal Spec structure that describes the new VM's components and their configuration. If this succeeds, machine initialization uses the resulting Spec to set up the new VM.

Both of these APIs accept an optional request to initialize via live migration, which contains the address of the migration source Propolis. If this is supplied:

Two APIs is one too many. Having two APIs is a maintenance headache, especially as we continue adding new Propolis configuration options (boot order and CPUID landed recently; configurable MSR dispositions and hypervisor enlightenments are on the horizon). We should consolidate down to a single VM creation API.


I think a better API would look something like this:

enum ReplacementComponent {
    // one variant for each ComponentV0 variant that can be replaced
    // during migration; see below
}

enum InstanceInitMethod {
    New {
        // N.B. doesn't need to be versioned since the client, sled agent, is
        // updated in lockstep with Propolis
        spec: InstanceSpecV0
    },
    MigrationTarget {
        migration_id: Uuid,
        source_addr: SocketAddr,
        replacement_components: HashMap<InstanceSpecKey, ReplacementComponent>
    }
}

struct InstanceEnsureRequest {
    // same as the existing instance properties struct
    properties: InstanceProperties,
    init: InstanceInitMethod
}

To create a new VM, a client creates a spec for it and passes an InstanceInitMethod::New in its ensure request. This will require Omicron to switch from using instance ensure requests to instance specs. This should mostly be a matter of moving Propolis's ensure-request-to-instance-spec logic up into Nexus; it shouldn't require any database schema changes. One additional benefit of doing this (besides simplifying Propolis's API) is that putting this logic in Nexus makes it easier to update. RFD 505 talks about all the tradeoffs of this approach in more detail.

In the new API, the migration interface doesn't take a full instance spec. Instead, the target Propolis takes the source spec that it gets from the migration protocol, replaces any entries therein that have entries in the caller-supplied replacement_components map, and uses the amended spec to initialize the new VM. The replacement map allows Nexus to e.g. specify new Crucible volume construction requests with new client generation numbers, but does not allow it to replace any device configuration. This approach has a couple of big benefits:

  1. Propolis no longer needs a huge migration-compatibility-check module: the target's configuration is compatible with the source's because it is the source's configuration. Removing this module makes adding new guest-visible features significantly easier.
  2. Nexus doesn't need to remember how it originally configured a VM in order to migrate it. This is a nice bug-prevention mechanism: today, changes to how Nexus constructs an InstanceEnsureRequest (or to how Propolis amends it!) can prevent a VM from ever being able to migrate (if the changes deterministically produce incompatible VM configuration). Even in the absence of bugs, this reduces the amount of instance initialization logic Nexus has to carry around ("I started this VM at this version of the code, so I have to use this specific logic to produce the correct spec/ensure request for it").

The server no longer reads device information from config TOMLs (it only reads the bootrom path; we might also be able to eliminate this someday, but I'm declaring it out of scope for this issue). The propolis-server-config library will stick around, though, and will provide a way for clients to convert a config TOML to a set of instance spec components so that they can easily use the new interface.

I've drafted enough of the server side of these changes to be pretty confident this scheme can be made to work and that it will make the system nicer overall.

iximeow commented 3 days ago

broadly seems good! though i'm trying to think through a few things in particular..

can we really get away with an unversioned InstanceSpec? all of {propolis, sled-agent, nexus} are updated in lockstep now, but as we get more live-upgradey that won't be true across the rack. version numbers not to scale, just for my thinking out loud: Nexus 1.0 on another sled may talk to a sled-agent 1.1 on "here", or Nexus 1.1 "here" could talk to a sled-agent 1.0 there. i'm fuzzy on the long run too, could it be that there are propolis 1.0 and 1.1 cohabitating on a sled? or at that point is the plan to migrate out any VMs, do the upgrade, then pull some back?

additionally, versions aside, double-checking my understanding of how migrations will work.. it sounds like:

sound right?

gjcolombo commented 3 days ago

Thanks @iximeow! Some (hopefully reasonable) answers to your questions:

can we really get away with an unversioned InstanceSpec? all of {propolis, sled-agent, nexus} are updated in lockstep now, but as we get more live-upgradey that won't be true across the rack.

Good callout. I'm assuming here that propolis and sled-agent will (at least for the foreseeable future) always be part of the same deployment unit (i.e. the host OS image that we deploy to the individual sleds). RFD 152 section 4.3 talks about this decision in a little more detail (the RFD predates me, but I would say the main idea is that there are enough close dependencies between host OS components, the sled agent, and Propolis that it's simplest just to bundle them all together).

version numbers not to scale, just for my thinking out loud: Nexus 1.0 on another sled may talk to a sled-agent 1.1 on "here", or Nexus 1.1 "here" could talk to a sled-agent 1.0 there.

This is almost certain to happen, though. My understanding (which is limited and might be wrong) is that our current plans to deal with this situation run along these lines:

I don't think this necessarily requires the use of a VersionedInstanceSpec in this interface: if we rearrange the entire instance spec structure such that we have to declare InstanceSpecV1 and change the sled agent interface to use it, then the API document will change accordingly.

With all that said, I think we need to keep VersionedInstanceSpec around to use in the migration protocol, where we're not using an HTTP interface or OpenAPI documents at all. In that case I think it's useful to have the option of being able to restructure the spec type somewhat without necessarily closing the door on our ability to roll back to an earlier Propolis that doesn't understand the new structure.

i'm fuzzy on the long run too, could it be that there are propolis 1.0 and 1.1 cohabitating on a sled? or at that point is the plan to migrate out any VMs, do the upgrade, then pull some back?

Yeah, if you wanted to do this kind of downgrade, you'd have to vacate the 1.1 sled, install the 1.0 image, and then migrate the VMs back.

additionally, versions aside, double-checking my understanding of how migrations will work.. it sounds like:

Yup, this is right. The one thing I'd adjust would be the point about compatibility quirks. To take the situation in #790 as an example, the way I think we'd express such a quirk would be:

This is pretty much how I expect things to work for most new "features" or configuration options. The main thing to keep in mind is that Nexus can't flip the switch on the "pad with spaces" flag until it knows that Propolis v2 will never be rolled back (or, at least, that Nexus no longer needs to promise that such a rollback won't require instances to stop).