lf-edge / eve

EVE is Edge Virtualization Engine
https://www.lfedge.org/projects/eve/
Apache License 2.0
468 stars 158 forks source link

Generic SR-IOV support #3049

Open DolceTriade opened 1 year ago

DolceTriade commented 1 year ago

Use case

EVE supports SR-IOV for network cards, however, PCI devices like GPUs and accelerators also support SR-IOV. There are edge use cases where passing through GPUs and accelerator cards (see Intel QAT, Intel N300, Intel ACC100, etc) to Virtualized workloads. This especially true for telco and connectivity use cases where offloading crypto and fec operations are important to achieve maximum performance with limited power and CPU budgets.

Describe the solution you'd like

I propose adding a new enum for IoGenericPF and IoGenericVF to the list of available PhysicalIO types. Then in domainmgr, in the same place we handle ioEthPF, we also create VFs for devices of type IoGenericPF and automatically populate the VFs in the available hardware.

It would be the responsibility of the EVE image builder to ensure that the required driver and firmware are included in the EVE image (or perform any required initialization...)

uncleDecart commented 1 year ago

Hey @DolceTriade,

Thanks for creating this issue! To my best knowledge, unfortunately, there's no way to generically create VF for GPUs or even for Mellanox NICs. Enabling SR-IOV requires driver (as you mentioned) and correct way of using it. In case of Nvidia and Mellanox driver API is different from what we already have. Moreover, introducing new drivers will increase EVE image size, which is not desirable. There's one approach, which in my opinion can fit best for this specific task: Device Driver Domain. In a nutshell it's a way of spinning up a VM to run specific driver within it and use common virtio interface to attach this driver to VM which needs it. This way we will not need to add anything to EVE image it'll stay generic and we will be able to support any SR-IOV driver we want (and any other third-party driver). Of course, this approach needs careful performance evaluation (but in theory it should not be more addition than SPDK) I'd be glad to share my findings once they are in more readable format

DolceTriade commented 1 year ago

I think the that while maybe GPUs are out of the question (certainly, nvidia needs a lot of deps), but accelerator cards do generally support SR-IOV without additional drivers.

While I see that nested passthrough is supported in QEMU (https://wiki.qemu.org/Features/VT-d#Use_Case_3:_Nested_Guest_Device_Assignment), I wonder what the performance implications will be and whether those costs would be acceptable.

The other workaround would have to be statically configure SR-IOV devices by adding a separate system level service to EVE specifically for that class of device, but that makes it hard to configure dynamically.

I think the bare minimum of using the same SR-IOV code for network cards and just doing the basic dance of setting num_vfs and binding the created VFs to vfio-pci and being able to allocate these dynamically would be a big step forward and would actually cover a surprising number of use cases (like DPDK applications using network accelerators or even Intel QAT).

uncleDecart commented 1 year ago

I think the bare minimum of using the same SR-IOV code for network cards and just doing the basic dance of setting num_vfs and binding the created VFs to vfio-pci and being able to allocate these dynamically would be a big step forward and would actually cover a surprising number of use cases (like DPDK applications using network accelerators or even Intel QAT). Dynamic allocation of VF can be very tricky. For instance old VF have to be removed first in order resize VFs. And if you have already passed through the other once to VMs, you will be removing these devices and that's actually not desirable behaviour. Of course, we can create some kind of stubs to manage that, but again, I'm not sure that there's a design which can guarantee availability and performance of such stub while we re-creating VF.

DolceTriade commented 1 year ago

Indeed. I believe that's why EVE only applies SR-IOV settings at first boot, which would be true for this use case as well?

By allocate dynamically, I mean being able to assign the VFs to workloads dynamically, not keep changing the number of VFs dynamically.