neondatabase / autoscaling

Postgres vertical autoscaling in k8s
Apache License 2.0
142 stars 16 forks source link

neonvm: Add support for virtio-mem #970

Closed sharnoff closed 2 weeks ago

sharnoff commented 2 weeks ago

Adapted from #705.

This implementation is focused on making virtio-mem a drop-in replacement for our existing DIMM hotplug -based memory scaling.

As such, the externally visible differences are:

  1. VirtualMachines have a new optional field: .spec.guest.memoryProvider
  2. VirtualMachines have a new optional field: .status.memoryProvider
  3. VirtualMachine .status.memorySize changes in smaller increments if using virtio-mem.

This change introduces the concept of a "memory provider", which is either "DIMMSlots" or "VirtioMem".

If a VM is created without setting the memory provider in the spec, the controller will use the default one, which is specified by its --default-memory-provider flag. This will only persist for the lifetime of the runner pod. If the VM is restarted, then it may change to a new default.

Additionally, because our usage of virtio-mem has 8MiB block sizes, we actually don't default to virtio-mem if the VM was created with some odd memory slot size that isn't divisible by 8MiB.

The memory provider of the current runner pod is stored in the new status field. For old VMs created before this change, the status field is set to DIMMSlots iff the pod name is set without the status field.

Also, we allow mutating .spec.guest.memoryProvider IFF it is being set equal to the current .status.memoryProvider, in case we we want old VMs to keep their dimm slots after switching to virtio-mem by default.


Changes to split out from this PR:

  1. 971

  2. 973

  3. 974

On top of those, there's a couple places where code was indented that should probably be split into its own commit so it can be properly added to .git-blame-ignore-revs :)


Remaining testing for this PR before merging:


Future/follow-up work:

  1. 981

  2. Switch to virtio-mem by default
  3. Remove support for DIMM hotplug
  4. Move from representing memory as "memory slots" towards the implementation
  5. Fix the raciness issue with migration target pod creation.
sharnoff commented 2 weeks ago

Cleaned up the commits in this PR -- added single commits for #973 and #974, and split some indenting into a separate commit. May move more indenting, need to see.

Anyways, hopefully should make the final commit easier to review :crossed_fingers:

sharnoff commented 2 weeks ago

Did backwards-compatibility testing locally (started a VM, upgraded the version, checked status was as expected, assuming that our existing e2e tests can take it from there). Found & fixed a bug. Also checked rollback safety; yeah, seems fine :) One interesting thing, posted it here: https://neondb.slack.com/archives/C039YKBRZB4/p1718950172991649