virtio-win / kvm-guest-drivers-windows

Windows paravirtualized drivers for QEMU\KVM
https://www.linux-kvm.org/page/WindowsGuestDrivers
BSD 3-Clause "New" or "Revised" License
1.93k stars 377 forks source link

Performance: Linux better vs. Windows for certain workloads #394

Closed rob-scheepens closed 3 years ago

rob-scheepens commented 5 years ago

During performance testing we noticed a significant difference (up to 25% delta) between Linux and Windows for sequential write workloads, where Linux is the faster one. Other workloads are also slower on Windows.

@vrozenfe : is this something you have seen in your testing as well?

vrozenfe commented 5 years ago

Normally Linux storage performs better than Windows, that's true. However 25% is a quite big difference. We do run performance testing every regularly to be sure that we don't enter some significant performance regressions while working on new features of bug fixing. From my experience, the results can vary, mostly depending on several factors, including backend, multiqueue, file system, block size and queue depth. Best regards, Vadim.

rob-scheepens commented 5 years ago

Hi @vrozenfe , finally got down to doing additional testing. One of our devs wrote a simple storport driver and when testing with that, I noticed a perf delta of up to 30% with single queue virtio, and some 10-15% with multiqueue vioscsi (for us bundled in Nutanix VirtIO 1.1.4, http://download.nutanix.com/mobility/1.1.4/Nutanix-VirtIO-1.1.4.iso).

vioscsi         sample storport driver
249856      262144
266240      262144
266240      262144
266240      262144
249856      262144
266240      262144
266240      262144
266240      262144
249856      262144
266240      262144

The buffer length values are always either 244 or 260kB for vioscsi, but 256kB for the sample driver. Have you seen this before?

vrozenfe commented 5 years ago

Hi @rob-scheepens . I don't think that I fully understand your questions. We definitely can get a bit higher IOPS by adjusting PERF_CONFIGURATION_DATA flags and other small changes in the driver code. The problem is that we need to keep some balance between IOPS and (v)CPU usage, as well as we need to keep some balance between single and multi-queue performance (still trying to keep (v)CPU usage as low as possible), as well as different types of backends (like a file, ram-disk, ssd/nvme direct lun etc.,). Sorry if I didn't answer your question. I might be able to give you a better answer if you provide more technical details.

Best regards, Vadim.

rob-scheepens commented 5 years ago

Hi @vrozenfe , the main question is: why is the DataBufferLength of the SRBs not always 256kB, but varies - sometimes 244kB, sometimes 260kB? Especially 260 seems strange to me since vioscsi defines MAX_PHYS_SEGMENTS to 64. But I did notice that VIRTIO_MAX_SG sets this to 64+3 in SrbExtension, which then could explain the larger than 256kB data buffer size. But how can the 244kB explained?

Again, in the sample driver I used, (MAX_PHYS_SEGMENTS is 64+2), I only see 256kB data buffer lengths for the SRBs.

The reason I'm seeing a perf drop with virtio is that our IO process is optimized for alignments at 8kB or 32kB boundaries. Since neither 244 or 260 are aligned, but 256kB is, this explains the lower perf with vioscsi.

Does this clarify it a bit?

Regards, \Rob

JonKohler commented 4 years ago

bump^

vrozenfe commented 4 years ago

I filed a new bug at https://bugzilla.redhat.com/show_bug.cgi?id=1787022 to help tracking the progress on this issue.

rob-scheepens commented 3 years ago

Hi @vrozenfe, this issue is resolved by commit a18fc89a8139884dc51add0f641c466f2d9826b2, thanks!

vrozenfe commented 3 years ago

I'm going to make it even simpler. More or less similar to what I've done for viosotor https://github.com/virtio-win/kvm-guest-drivers-windows/pull/487. It will help to get a bit better performance. Best, Vadim.