Open zjmletang opened 8 months ago
I have been exploring the details surrounding VIRTIO_NET_F_MRG_RXBUF and found myself a bit perplexed regarding the association between the merging of the Virtio header and the packet as mentioned in the issue, in relation to the activation of VIRTIO_NET_F_MRG_RXBUF. It appears to me that the merging of the Virtio Header with the packet is feasible even when VIRTIO_NET_F_MRG_RXBUF is not utilized. If it's not too much trouble, I would greatly appreciate your insights on how these two aspects are interconnected.
@zjmletang VIRTIO_NET_F_MRG_RXBUF is automatically enabled when the device indicates VIRTIO_F_VERSION_1, this is part of virtio spec (network part). Just indicating merged buffers does not make any effect. Existing driver in case when RSC is enabled allocates buffers of ~64K for each RX packet (this might be up to 17 pages), so the device does not have any motivation even to try using merged buffers. In case of no RSC - the driver uses one or more 4K buffers depending on Jumbo setting.
@ybendito thanks
Sure, I will submit the code as a patch in a few days, just for discussion purposes.
host features 0x000001035867ffe3, guest features 0x00000003186799a3
16
We are more concerned with how this change would enhance the overall PCIe bandwidth on the host side.
@ybendito Based on your suggestion, I've uploaded the code. This section of the code still requires optimization and now is solely for discussion purposes. please see https://github.com/virtio-win/kvm-guest-drivers-windows/pull/1089
@zjmletang
@zjmletang
- Applying your suggestion on upstream at the moment seems to me kind of problematic: the change is very sensitive, the existing driver supports not only V1 devices but also legacy devices that must have the header in separate descriptor (see virtio spec 1.3, 5.1.6.6) and so the driver must be tested well in several non-default device configurations. I'm not sure we have enough resources for such coverage at the moment
@ybendito , Indeed,it is very sensitive. Is it possible for us to make distinctions based on whether the VIRTIO_F_ANY_LAYOUT feature bit has been negotiated or not? If it has been successfully negotiated, we could consider merging; if not, we could maintain the separation into two segments.
cc @sb-ntnx fyi - this would be neat for us as well.
hi @ybendito I am writing to follow up on this issue, I proposed a solution to make distinctions based on whether the VIRTIO_F_ANY_LAYOUT feature bit has been negotiated(If this solution is incorrect, please let me know as well). This would help mitigate potential risks while addressing the needs for both V1 and legacy devices (virtio spec 1.3, section 5.1.6.6).
Could you please let me know if there are any plans for further modifications or additional concerns that need to be addressed? Your feedback will greatly help us decide our next steps, whether to await further changes or start making some adjustments based on the current state of the community's code.
Thank you very much for your time and assistance.
Best regards,
@zjmletang
- Applying your suggestion on upstream at the moment seems to me kind of problematic: the change is very sensitive, the existing driver supports not only V1 devices but also legacy devices that must have the header in separate descriptor (see virtio spec 1.3, 5.1.6.6) and so the driver must be tested well in several non-default device configurations. I'm not sure we have enough resources for such coverage at the moment
@ybendito , Indeed,it is very sensitive. Is it possible for us to make distinctions based on whether the VIRTIO_F_ANY_LAYOUT feature bit has been negotiated or not? If it has been successfully negotiated, we could consider merging; if not, we could maintain the separation into two segments.
Is your feature request related to a problem? Please describe. In the current reception logic of netkvm, due to virtio protocol headers and data packets being in two separate memory blocks. so at least two memory blocks are needed for one descriptor. From the host's perspective, this means that the network card (hardware implementation) requires two DMA operations to retrieve a single packet, thus consuming more PCIe bandwidth. When PCIe bandwidth resources are strained, this leads to a performance bottleneck as the CPU retrieves descriptors from the Windows virtio frontend driver more slowly.
Describe the solution you'd like I'd like to discuss with you whether this can be done: The virtio header and data packet can be combined into the same memory block, so that in most scenarios, a descriptor need only contain one memory block.
I conducted tests on our own cloud platform according to the improvement approach and the tests show that this method is fairly effective at increasing hardware bandwidth utilization, with the overall network card throughput increasing about 10%.