rust-vmm / vhost

Apache License 2.0
126 stars 64 forks source link

vhost-user-backend: bump up MAX_MEM_SLOTS to 509 #224

Closed germag closed 6 months ago

germag commented 6 months ago

Summary of the PR

Let's support up to 509 memslots, just like vhost in the kernel usually does. This is required to properly support memory hotplug, either using multyiple DIMMs (ACPI supports up to 256) or using virtio-mem.

The 509 used to be the KVM limit, it supported 512, but 3 were used for internal purposes. Currently, KVM supports more than 512, but it usually doesn't make use of more than ~260 (i.e., 256 DIMMs + boot memory), except when other memory devices like PCI devices with BARs are used. So, 509 seems to work well for vhost in the kernel.

Details can be found in the QEMU change that made virtio-mem consume up to 256 memslots accross all virtio-mem devices. [1]

509 memslots implies 509 VMAs/mappings in the worst case (even though, in practice with virtio-mem we won't be seeing more than ~260 in most setups).

With max_map_count under Linux defaulting to 64k, 509 memslots still correspond to less than 1% of the maxium number of mappings. There are plenty left for the application to consume.

[1] https://lore.kernel.org/all/20230926185738.277351-1-david@redhat.com/

germag commented 6 months ago

v2:

stefano-garzarella commented 6 months ago

It looks like this issue prevents any vhost-user device to work well with virtio-mem. Here an issue with virtiofsd and virtio-mem: https://issues.redhat.com/browse/RHEL-15317

We want to support dynamically using multiple memslots to expose virtio-mem device memory to the VM; using dynamically multiple memslots can drastically reduce memory overhead in the hypervisor (especially, KVM), when a device exposes only comparatively small memory towards the VM, compared to its possible maximum size.

For QEMU, the feature is enabled using "dynamic-memslots=on". With "dynamic-memslots=off", the feature is disabled and we default to using a single large memslot statically.

In combination with vhost devices, this new feature can be problematic if the devices support less than 509 memslots. If such devices are created before the virtio-mem device in QEMU, virtio-mem will default to the old handling of using a single memslot only. If the devices are created after the virtio-mem devices (on the cmdline, hotplug of such devices), QEMU will bail out.

@sboeuf @Ablu WDYT about a new vhost-user-backend release (i.e. v0.13.0)?

Ablu commented 6 months ago

@stefano-garzarella: I do not know the mechanics well enough to have an opinion here. Feel free to move ahead with a release.