nutanix / libvfio-user

framework for emulating devices in userspace
BSD 3-Clause "New" or "Revised" License
166 stars 51 forks source link

Increase MAX_DMA_REGIONS for vIOMMU use case #588

Open sboeuf opened 3 years ago

sboeuf commented 3 years ago

MAX_DMA_REGIONS is currently 16, which is way too small if we try to support the vIOMMU use case, as we might end up with way more than 16 active mappings (especially if the mappings are 4KiB).

jlevon commented 3 years ago

The major reason we have this limit is that we have an array, and it's sized based on that. We could reasonably easy make this a resizable array (we're already not thread safe)

We still would want some upper limit to avoid a malicious client from exhausting server memory, but this can probably be much bigger.

(gdb) print sizeof(dma_memory_region_t)
$1 = 88

For a 1TB guest, if we allowed every 4KB page to be its own region, we'd end up with an array size of 22GB.

So, we will still need some "reasonable" limit. Any suggestions?

sboeuf commented 3 years ago

Maybe make it 262144 so that we can handle up to 1GiB at a given time. But I understand the biggest problem is the fact that the array has a fixed size. If you can fix this limitation, then it's fine to set the limit to something even bigger.

jlevon commented 3 years ago

it's relatively easy to change the array to be resizable. But even 1GB seems a bit pricy for servers that might be dealing with lots of clients. It's an array because we really want O(1) lookup performance - the sg map/unmap APIs are hot path. And I'm kind of loathe to switch to an amortized O(1) data structure just for vIOMMU. So we're stuck with trying to figure out a max size. We could make it configurable, perhaps, but that seems less than ideal. Having said that though, swiotlb (one of the vIOMMU users) itself is only statically configurable.

sboeuf commented 3 years ago

It's an array because we really want O(1) lookup performance - the sg map/unmap APIs are hot path. And I'm kind of loathe to switch to an amortized O(1) data structure just for vIOMMU.

I fully understand. The solution of making the size configurable is not great either because we don't really know how much we're gonna need, it depends on the device that's exposed and the workload/driver running in the guest.

This is really a tricky situation and maybe the best solution would be to explicitly state in the specification that vfio-user is not fit for the vIOMMU use case.

jlevon commented 3 years ago

We can definitely make clear that it's not currently supported well enough, certainly. I dont' want to say we'll never have some form of support though.

sboeuf commented 3 years ago

We can definitely make clear that it's not currently supported well enough, certainly. I dont' want to say we'll never have some form of support though.

I think that would help as we can see there are several issues to tackle before being able to claim it is supported.

jlevon commented 3 years ago

https://github.com/nutanix/libvfio-user/pull/590

tmakatos commented 3 years ago

And I'm kind of loathe to switch to an amortized O(1) data structure just for vIOMMU.

We can make the DMA configurable to use two bookkeeping method (static array vs. list or resizeable array) to accomodate for the vIOMMU case, and hide all this complexity behind functions so the DMA controller itself doesn't get too complicated. We can even make it a compile time option.

jlevon commented 3 years ago

Compile time would be annoying; I do like the idea of switching book-keeping types - but perhaps that means we need the SPDK guarantees to not use vfu_map to also* apply to DMA_MAP? So we can resize the array dynamically or wahtever?

tmakatos commented 3 years ago

but perhaps that means we need the SPDK guarantees to not use vfu_map* to also apply to DMA_MAP? So we can resize the array dynamically or wahtever?

Sorry I don't understand, which guarantees?

jlevon commented 3 years ago

I'm referring to whether SPDK allows active vfu_map,unmap_sg() users in IO threads, when it gets an unmap request. They want to use those APIs locklessly, so they can't do an unmap at the same time.

We still need to think really carefully about that though, once SPDK finishes its DMA unregister callback, we modify the structs. Do we need extra callbacks or something?