Open gjcolombo opened 6 days ago
broadly seems good! though i'm trying to think through a few things in particular..
can we really get away with an unversioned InstanceSpec
? all of {propolis, sled-agent, nexus}
are updated in lockstep now, but as we get more live-upgradey that won't be true across the rack. version numbers not to scale, just for my thinking out loud: Nexus 1.0 on another sled may talk to a sled-agent
1.1 on "here", or Nexus 1.1 "here" could talk to a sled-agent
1.0 there. i'm fuzzy on the long run too, could it be that there are propolis
1.0 and 1.1 cohabitating on a sled? or at that point is the plan to migrate out any VMs, do the upgrade, then pull some back?
additionally, versions aside, double-checking my understanding of how migrations will work.. it sounds like:
replacement_components
, queries the source Propolis for a Spec
replacement_component
s to the checked Spec
sound right?
Thanks @iximeow! Some (hopefully reasonable) answers to your questions:
can we really get away with an unversioned InstanceSpec? all of {propolis, sled-agent, nexus} are updated in lockstep now, but as we get more live-upgradey that won't be true across the rack.
Good callout. I'm assuming here that propolis and sled-agent will (at least for the foreseeable future) always be part of the same deployment unit (i.e. the host OS image that we deploy to the individual sleds). RFD 152 section 4.3 talks about this decision in a little more detail (the RFD predates me, but I would say the main idea is that there are enough close dependencies between host OS components, the sled agent, and Propolis that it's simplest just to bundle them all together).
version numbers not to scale, just for my thinking out loud: Nexus 1.0 on another sled may talk to a sled-agent 1.1 on "here", or Nexus 1.1 "here" could talk to a sled-agent 1.0 there.
This is almost certain to happen, though. My understanding (which is limited and might be wrong) is that our current plans to deal with this situation run along these lines:
I don't think this necessarily requires the use of a VersionedInstanceSpec
in this interface: if we rearrange the entire instance spec structure such that we have to declare InstanceSpecV1
and change the sled agent interface to use it, then the API document will change accordingly.
With all that said, I think we need to keep VersionedInstanceSpec
around to use in the migration protocol, where we're not using an HTTP interface or OpenAPI documents at all. In that case I think it's useful to have the option of being able to restructure the spec type somewhat without necessarily closing the door on our ability to roll back to an earlier Propolis that doesn't understand the new structure.
i'm fuzzy on the long run too, could it be that there are propolis 1.0 and 1.1 cohabitating on a sled? or at that point is the plan to migrate out any VMs, do the upgrade, then pull some back?
Yeah, if you wanted to do this kind of downgrade, you'd have to vacate the 1.1 sled, install the 1.0 image, and then migrate the VMs back.
additionally, versions aside, double-checking my understanding of how migrations will work.. it sounds like:
Yup, this is right. The one thing I'd adjust would be the point about compatibility quirks. To take the situation in #790 as an example, the way I think we'd express such a quirk would be:
NvmeDisk
component has no special control flagsNvmeDisk
has a new flag that indicates whether to pad with spaces or NULLs
serde
default for this field is "pad with NULLs"deny_unknown_fields
on its spec elementsThis is pretty much how I expect things to work for most new "features" or configuration options. The main thing to keep in mind is that Nexus can't flip the switch on the "pad with spaces" flag until it knows that Propolis v2 will never be rolled back (or, at least, that Nexus no longer needs to promise that such a rollback won't require instances to stop).
Today the server API offers two distinct ways to initialize a new VM:
InstanceEnsureRequest
that specifies some, but not all, of the components it wants the new VM to have. Propolis then supplements this configuration with some default devices (e.g. serial ports), converts some of the information in the ensure request into concrete configuration data (e.g. slot numbers to PCI BDFs), and adds any devices specified in the config TOML passed to the server on its command line.VersionedInstanceSpec
that fully specifies all the components the new VM should have. The server does not add any extra devices by default and ignores the device list in the config TOML (though the TOML is still used to specify the bootrom path).In both cases, the server's first task is to try to convert its ensure parameters into a server-internal
Spec
structure that describes the new VM's components and their configuration. If this succeeds, machine initialization uses the resultingSpec
to set up the new VM.Both of these APIs accept an optional request to initialize via live migration, which contains the address of the migration source Propolis. If this is supplied:
Spec
as usual, but before creating any VM components, it starts the live migration protocol.VersionedInstanceSpec
to the target Propolis.Spec
to make sure the two specs describe "migration-compatible" VMs; that is, the two specs must describe the same guest-visible abstractions and configuration. If they don't, migration fails.Two APIs is one too many. Having two APIs is a maintenance headache, especially as we continue adding new Propolis configuration options (boot order and CPUID landed recently; configurable MSR dispositions and hypervisor enlightenments are on the horizon). We should consolidate down to a single VM creation API.
I think a better API would look something like this:
To create a new VM, a client creates a spec for it and passes an
InstanceInitMethod::New
in its ensure request. This will require Omicron to switch from using instance ensure requests to instance specs. This should mostly be a matter of moving Propolis's ensure-request-to-instance-spec logic up into Nexus; it shouldn't require any database schema changes. One additional benefit of doing this (besides simplifying Propolis's API) is that putting this logic in Nexus makes it easier to update. RFD 505 talks about all the tradeoffs of this approach in more detail.In the new API, the migration interface doesn't take a full instance spec. Instead, the target Propolis takes the source spec that it gets from the migration protocol, replaces any entries therein that have entries in the caller-supplied
replacement_components
map, and uses the amended spec to initialize the new VM. The replacement map allows Nexus to e.g. specify new Crucible volume construction requests with new client generation numbers, but does not allow it to replace any device configuration. This approach has a couple of big benefits:InstanceEnsureRequest
(or to how Propolis amends it!) can prevent a VM from ever being able to migrate (if the changes deterministically produce incompatible VM configuration). Even in the absence of bugs, this reduces the amount of instance initialization logic Nexus has to carry around ("I started this VM at this version of the code, so I have to use this specific logic to produce the correct spec/ensure request for it").The server no longer reads device information from config TOMLs (it only reads the bootrom path; we might also be able to eliminate this someday, but I'm declaring it out of scope for this issue). The
propolis-server-config
library will stick around, though, and will provide a way for clients to convert a config TOML to a set of instance spec components so that they can easily use the new interface.I've drafted enough of the server side of these changes to be pretty confident this scheme can be made to work and that it will make the system nicer overall.