Open sboeuf opened 3 years ago
The major reason we have this limit is that we have an array, and it's sized based on that. We could reasonably easy make this a resizable array (we're already not thread safe)
We still would want some upper limit to avoid a malicious client from exhausting server memory, but this can probably be much bigger.
(gdb) print sizeof(dma_memory_region_t)
$1 = 88
For a 1TB guest, if we allowed every 4KB page to be its own region, we'd end up with an array size of 22GB.
So, we will still need some "reasonable" limit. Any suggestions?
Maybe make it 262144
so that we can handle up to 1GiB at a given time. But I understand the biggest problem is the fact that the array has a fixed size. If you can fix this limitation, then it's fine to set the limit to something even bigger.
it's relatively easy to change the array to be resizable. But even 1GB seems a bit pricy for servers that might be dealing with lots of clients. It's an array because we really want O(1) lookup performance - the sg map/unmap APIs are hot path. And I'm kind of loathe to switch to an amortized O(1) data structure just for vIOMMU. So we're stuck with trying to figure out a max size. We could make it configurable, perhaps, but that seems less than ideal. Having said that though, swiotlb (one of the vIOMMU users) itself is only statically configurable.
It's an array because we really want O(1) lookup performance - the sg map/unmap APIs are hot path. And I'm kind of loathe to switch to an amortized O(1) data structure just for vIOMMU.
I fully understand. The solution of making the size configurable is not great either because we don't really know how much we're gonna need, it depends on the device that's exposed and the workload/driver running in the guest.
This is really a tricky situation and maybe the best solution would be to explicitly state in the specification that vfio-user
is not fit for the vIOMMU use case.
We can definitely make clear that it's not currently supported well enough, certainly. I dont' want to say we'll never have some form of support though.
We can definitely make clear that it's not currently supported well enough, certainly. I dont' want to say we'll never have some form of support though.
I think that would help as we can see there are several issues to tackle before being able to claim it is supported.
And I'm kind of loathe to switch to an amortized O(1) data structure just for vIOMMU.
We can make the DMA configurable to use two bookkeeping method (static array vs. list or resizeable array) to accomodate for the vIOMMU case, and hide all this complexity behind functions so the DMA controller itself doesn't get too complicated. We can even make it a compile time option.
Compile time would be annoying; I do like the idea of switching book-keeping types - but perhaps that means we need the SPDK guarantees to not use vfu_map to also* apply to DMA_MAP? So we can resize the array dynamically or wahtever?
but perhaps that means we need the SPDK guarantees to not use vfu_map* to also apply to DMA_MAP? So we can resize the array dynamically or wahtever?
Sorry I don't understand, which guarantees?
I'm referring to whether SPDK allows active vfu_map,unmap_sg() users in IO threads, when it gets an unmap request. They want to use those APIs locklessly, so they can't do an unmap at the same time.
We still need to think really carefully about that though, once SPDK finishes its DMA unregister callback, we modify the structs. Do we need extra callbacks or something?
MAX_DMA_REGIONS
is currently16
, which is way too small if we try to support the vIOMMU use case, as we might end up with way more than 16 active mappings (especially if the mappings are 4KiB).