neonvm: Add support for virtio-mem

sharnoff commented 2 weeks ago

Adapted from #705.

This implementation is focused on making virtio-mem a drop-in replacement for our existing DIMM hotplug -based memory scaling.

As such, the externally visible differences are:

VirtualMachines have a new optional field: .spec.guest.memoryProvider
VirtualMachines have a new optional field: .status.memoryProvider
VirtualMachine .status.memorySize changes in smaller increments if using virtio-mem.

This change introduces the concept of a "memory provider", which is either "DIMMSlots" or "VirtioMem".

If a VM is created without setting the memory provider in the spec, the controller will use the default one, which is specified by its --default-memory-provider flag. This will only persist for the lifetime of the runner pod. If the VM is restarted, then it may change to a new default.

Additionally, because our usage of virtio-mem has 8MiB block sizes, we actually don't default to virtio-mem if the VM was created with some odd memory slot size that isn't divisible by 8MiB.

The memory provider of the current runner pod is stored in the new status field. For old VMs created before this change, the status field is set to DIMMSlots iff the pod name is set without the status field.

Also, we allow mutating .spec.guest.memoryProvider IFF it is being set equal to the current .status.memoryProvider, in case we we want old VMs to keep their dimm slots after switching to virtio-mem by default.

Changes to split out from this PR:

971
973
974

On top of those, there's a couple places where code was indented that should probably be split into its own commit so it can be properly added to .git-blame-ignore-revs :)

Remaining testing for this PR before merging:

[x] Test that backwards compatibility works as expected
- [x] Old VMs are updated while running as described above
- [x] Old VMs switch to the new default on restart
[x] Test rollback safety: that it's ok to go between the last release and this PR (with --default-memory-provider=DIMMSlots).
- Note: If --default-memory-provider=VirtioMem when this is released, we cannot make it rollback-safe.
[ ] ~~Test that updating .spec.guest.memoryProvider works as expected~~

Future/follow-up work:

981
Switch to virtio-mem by default
Remove support for DIMM hotplug
Move from representing memory as "memory slots" towards the implementation
Fix the raciness issue with migration target pod creation.

sharnoff commented 2 weeks ago

Cleaned up the commits in this PR -- added single commits for #973 and #974, and split some indenting into a separate commit. May move more indenting, need to see.

Anyways, hopefully should make the final commit easier to review :crossed_fingers:

sharnoff commented 2 weeks ago

Did backwards-compatibility testing locally (started a VM, upgraded the version, checked status was as expected, assuming that our existing e2e tests can take it from there). Found & fixed a bug. Also checked rollback safety; yeah, seems fine :) One interesting thing, posted it here: https://neondb.slack.com/archives/C039YKBRZB4/p1718950172991649

neondatabase / autoscaling

neonvm: Add support for virtio-mem #970

971

973

974

981