two instance ensure APIs is one too many

gjcolombo commented 6 days ago

Today the server API offers two distinct ways to initialize a new VM:

The caller can pass an InstanceEnsureRequest that specifies some, but not all, of the components it wants the new VM to have. Propolis then supplements this configuration with some default devices (e.g. serial ports), converts some of the information in the ensure request into concrete configuration data (e.g. slot numbers to PCI BDFs), and adds any devices specified in the config TOML passed to the server on its command line.
Alternatively, the caller can pass a VersionedInstanceSpec that fully specifies all the components the new VM should have. The server does not add any extra devices by default and ignores the device list in the config TOML (though the TOML is still used to specify the bootrom path).

In both cases, the server's first task is to try to convert its ensure parameters into a server-internal Spec structure that describes the new VM's components and their configuration. If this succeeds, machine initialization uses the resulting Spec to set up the new VM.

Both of these APIs accept an optional request to initialize via live migration, which contains the address of the migration source Propolis. If this is supplied:

The target creates its internal Spec as usual, but before creating any VM components, it starts the live migration protocol.
At the beginning of the protocol, the source Propolis sends its VersionedInstanceSpec to the target Propolis.
The target compares this spec to its own Spec to make sure the two specs describe "migration-compatible" VMs; that is, the two specs must describe the same guest-visible abstractions and configuration. If they don't, migration fails.
If the compat check passes, the target creates its VM components, then runs the rest of the LM protocol to get a copy of the source's component state. The target imports this state into its components, then resumes the VM.

Two APIs is one too many. Having two APIs is a maintenance headache, especially as we continue adding new Propolis configuration options (boot order and CPUID landed recently; configurable MSR dispositions and hypervisor enlightenments are on the horizon). We should consolidate down to a single VM creation API.

I think a better API would look something like this:

enum ReplacementComponent {
    // one variant for each ComponentV0 variant that can be replaced
    // during migration; see below
}

enum InstanceInitMethod {
    New {
        // N.B. doesn't need to be versioned since the client, sled agent, is
        // updated in lockstep with Propolis
        spec: InstanceSpecV0
    },
    MigrationTarget {
        migration_id: Uuid,
        source_addr: SocketAddr,
        replacement_components: HashMap<InstanceSpecKey, ReplacementComponent>
    }
}

struct InstanceEnsureRequest {
    // same as the existing instance properties struct
    properties: InstanceProperties,
    init: InstanceInitMethod
}

To create a new VM, a client creates a spec for it and passes an InstanceInitMethod::New in its ensure request. This will require Omicron to switch from using instance ensure requests to instance specs. This should mostly be a matter of moving Propolis's ensure-request-to-instance-spec logic up into Nexus; it shouldn't require any database schema changes. One additional benefit of doing this (besides simplifying Propolis's API) is that putting this logic in Nexus makes it easier to update. RFD 505 talks about all the tradeoffs of this approach in more detail.

In the new API, the migration interface doesn't take a full instance spec. Instead, the target Propolis takes the source spec that it gets from the migration protocol, replaces any entries therein that have entries in the caller-supplied replacement_components map, and uses the amended spec to initialize the new VM. The replacement map allows Nexus to e.g. specify new Crucible volume construction requests with new client generation numbers, but does not allow it to replace any device configuration. This approach has a couple of big benefits:

Propolis no longer needs a huge migration-compatibility-check module: the target's configuration is compatible with the source's because it is the source's configuration. Removing this module makes adding new guest-visible features significantly easier.
Nexus doesn't need to remember how it originally configured a VM in order to migrate it. This is a nice bug-prevention mechanism: today, changes to how Nexus constructs an InstanceEnsureRequest (or to how Propolis amends it!) can prevent a VM from ever being able to migrate (if the changes deterministically produce incompatible VM configuration). Even in the absence of bugs, this reduces the amount of instance initialization logic Nexus has to carry around ("I started this VM at this version of the code, so I have to use this specific logic to produce the correct spec/ensure request for it").

The server no longer reads device information from config TOMLs (it only reads the bootrom path; we might also be able to eliminate this someday, but I'm declaring it out of scope for this issue). The propolis-server-config library will stick around, though, and will provide a way for clients to convert a config TOML to a set of instance spec components so that they can easily use the new interface.

I've drafted enough of the server side of these changes to be pretty confident this scheme can be made to work and that it will make the system nicer overall.

iximeow commented 3 days ago

broadly seems good! though i'm trying to think through a few things in particular..

can we really get away with an unversioned InstanceSpec? all of {propolis, sled-agent, nexus} are updated in lockstep now, but as we get more live-upgradey that won't be true across the rack. version numbers not to scale, just for my thinking out loud: Nexus 1.0 on another sled may talk to a sled-agent 1.1 on "here", or Nexus 1.1 "here" could talk to a sled-agent 1.0 there. i'm fuzzy on the long run too, could it be that there are propolis 1.0 and 1.1 cohabitating on a sled? or at that point is the plan to migrate out any VMs, do the upgrade, then pull some back?

additionally, versions aside, double-checking my understanding of how migrations will work.. it sounds like:

Nexus causes the destination Propolis to be created, tells it to migrate from the source Propolis
destination Propolis remembers replacement_components, queries the source Propolis for a Spec
still a smaller migration compatibility check? this is where we'd be fitting in a notion of "even though we have the same components, can i support the same quirks as you had?" a la the rare cases you and Eliza were talking about re. RFD 505
add in replacement_components to the checked Spec
proceed with migration

sound right?

gjcolombo commented 3 days ago

Thanks @iximeow! Some (hopefully reasonable) answers to your questions:

can we really get away with an unversioned InstanceSpec? all of {propolis, sled-agent, nexus} are updated in lockstep now, but as we get more live-upgradey that won't be true across the rack.

Good callout. I'm assuming here that propolis and sled-agent will (at least for the foreseeable future) always be part of the same deployment unit (i.e. the host OS image that we deploy to the individual sleds). RFD 152 section 4.3 talks about this decision in a little more detail (the RFD predates me, but I would say the main idea is that there are enough close dependencies between host OS components, the sled agent, and Propolis that it's simplest just to bundle them all together).

version numbers not to scale, just for my thinking out loud: Nexus 1.0 on another sled may talk to a sled-agent 1.1 on "here", or Nexus 1.1 "here" could talk to a sled-agent 1.0 there.

This is almost certain to happen, though. My understanding (which is limited and might be wrong) is that our current plans to deal with this situation run along these lines:

The OpenAPI documents for a specific interface (here, the one sled agent exposes to Nexus) become versioned
Sled agent provides Dropshot handlers for a range of supported versions (1.0 and 1.1 in your example)
Reconfigurator requires sled agents to be updated to version 1.1 before applying any Nexus updates that require this version
Eventually we write sled agent version 1.2; we know/declare that this version can only be installed once all Nexuses are at version 1.1; therefore it becomes safe to remove support for 1.0 in this new version

I don't think this necessarily requires the use of a VersionedInstanceSpec in this interface: if we rearrange the entire instance spec structure such that we have to declare InstanceSpecV1 and change the sled agent interface to use it, then the API document will change accordingly.

With all that said, I think we need to keep VersionedInstanceSpec around to use in the migration protocol, where we're not using an HTTP interface or OpenAPI documents at all. In that case I think it's useful to have the option of being able to restructure the spec type somewhat without necessarily closing the door on our ability to roll back to an earlier Propolis that doesn't understand the new structure.

i'm fuzzy on the long run too, could it be that there are propolis 1.0 and 1.1 cohabitating on a sled? or at that point is the plan to migrate out any VMs, do the upgrade, then pull some back?

Yeah, if you wanted to do this kind of downgrade, you'd have to vacate the 1.1 sled, install the 1.0 image, and then migrate the VMs back.

additionally, versions aside, double-checking my understanding of how migrations will work.. it sounds like:

Yup, this is right. The one thing I'd adjust would be the point about compatibility quirks. To take the situation in #790 as an example, the way I think we'd express such a quirk would be:

v1 Propolis's NvmeDisk component has no special control flags
v2 Propolis's NvmeDisk has a new flag that indicates whether to pad with spaces or NULLs
- the serde default for this field is "pad with NULLs"
- if the option is "pad with NULLs" we skip deserializing
- on v1 -> v2 upgrade, the source's spec doesn't contain the new flag, but uplevel fills it in with the correct default (i.e. whatever the behavior was before the flag was defined)
- on v2 -> v1 downgrade, the (uplevel) source won't serialize the new flag, so the downlevel Propolis will still accept the spec, even though it otherwise sets deny_unknown_fields on its spec elements
- on v2 -> v3 upgrade, the source still won't serialize the new flag, but the target will still choose the correct default behavior

This is pretty much how I expect things to work for most new "features" or configuration options. The main thing to keep in mind is that Nexus can't flip the switch on the "pad with spaces" flag until it knows that Propolis v2 will never be rolled back (or, at least, that Nexus no longer needs to promise that such a rollback won't require instances to stop).

oxidecomputer / propolis

two instance ensure APIs is one too many #804