virtio-win / kvm-guest-drivers-windows

Windows paravirtualized drivers for QEMU\KVM
https://www.linux-kvm.org/page/WindowsGuestDrivers
BSD 3-Clause "New" or "Revised" License
2.01k stars 385 forks source link

netkvm: Performance related question #167

Open vbusireddy opened 7 years ago

vbusireddy commented 7 years ago

I have two servers running Linux kernel 4.1.12-103.3.8.el7uek.x86_64 (Oracle Linux 7.4), interconnected with a 100GB ethernet link. When I run netperf between these systems, I get a throughput of >96% of the line rate.

I then created two Fedora 25 KVM guests, one on each server. The setup is: QEMU version 2.9.50 16 vcps and 8GB RAM virtio-net-pci device with 16 queues, 40 vectors, and mq enabled.

Running netperf between these guests also results in a throughput of >95% of the line rate.

However, if I replace the Fedora 25 guests with two Windows 2012 R2 guests, with the same QEMU options, I get a throughput of about 50-60% of the line rate!

Has anyone run benchmarks across 100GB links with Windows virtio drivers? And what kind of throughput is achieved? Isn't 60% of line rate is a bit low? I was hoping to get the same throughput as the Fedora guests! Any tricks to improve the throuput with Windows 2012 guests?

Thanks in advance!

Venu

ladipro commented 7 years ago

I believe that @sameehj has recently run networking benchmarks and may be able to offer specific tips.

In general:

1) Make sure to configure QEMU to expose the recommended Hyper-V enlightenments per http://blog.wikichoon.com/2014/07/enabling-hyper-v-enlightenments-with-kvm.html

2) Use the latest NetKVM virtio-win driver. Or at least >= 126 so you get RSC support https://github.com/virtio-win/kvm-guest-drivers-windows/wiki/netkvm-RSC-(receive-segment-coalescing)-feature

sameehj commented 7 years ago

Hi @vbusireddy,

First of all I'd try to enable the Hyper-v enlightenments and check the driver's version as @ladipro already suggested.

While testing I have noticed that iperf2 gave me much better results than netperf on Windows. When you enable multiqueue remember to add "-P #number_of_queues" to the iperf command line, for enabling multiple streams while testing. In general you should reach similar results to the virtio-net on Linux at least in big packet sizes (4k+).

Please do try these tips and update us on your test results once you have them =)

vbusireddy commented 7 years ago

@ladipro (bullet 1): I am using qemu 2.9.50 to create my Windows 2012 R2 guests. I couldn't see any of the "Hyper-V enlightenment" options (hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time) in qemu 2.9.50! Looked up the qemu source code too, and there is no hint of those options. What are the equivalent options in qemu?

@ladipro (bullet 2): I have used the driver version 0.1.141. That is the latest I see at https://fedorapeople.org/groups/virt/virtio-win/repo/stable/. I have also tried with the drivers built using the latest upstream code (as of Aug 29th, commit ID 825e80c789e6b7ef267feef79d0454a32710a996, from Ladi), and that driver also produces same results as 0.1.141. And I did verify on my guest that RSC is enabled, and 'ethtool' on the host shows the correct configuration as suggested in the RSC link that you referred to.

@sameehj: Indeed, iperf2 produced better results than netperf. However, netperf produces much better results if I spawn multiple independent processes talking to the same remote guest and sum up the individual results. In my tests, iperf2 with "-P 16" yielded about 45-50Gbps, whereas 16 netperf instances yielded a throughput of 65-70Gbps (both tests run with 16 queues, 16 vcpus, and 9000 MTU).

ladipro commented 7 years ago

@ladipro (bullet 1): I am using qemu 2.9.50 to create my Windows 2012 R2 guests. I couldn't see any of the "Hyper-V enlightenment" options (hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time) in qemu 2.9.50! Looked up the qemu source code too, and there is no hint of those options. What are the equivalent options in qemu?

The QEMU source code has these options in hyphenated form (hv-relaxed etc.) but the command line takes either. Please try this:

-cpu host,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time

vbusireddy commented 7 years ago

I tried both syntaxes. The performance did not change. I am still getting about 65-70Gbps, in both forms of usage.

Any other thoughts/suggestions?

That also brings me back to another question that I asked earlier. Has anyone run benchmarks across a 100Gbps link with the Windows drivers? And what were the results?

sameehj commented 7 years ago

I tried both syntaxes. The performance did not change. I am still getting about 65-70Gbps, in both forms of usage. Any other thoughts/suggestions?

Can you perform the same test with a 100G card that is assigned to the Windows guest using vfio assignment? What are the results that you are getting? This test could give us a good baseline for what are the limitations that can be achieved with Windows.

That also brings me back to another question that I asked earlier. Has anyone run benchmarks across a 100Gbps link with the Windows drivers? And what were the results?

No, we haven't done that yet.

vbusireddy commented 7 years ago

Hi! I ran the tests with VFIO assignment. Between host A running Linux, and a Windows guest on host B with VFIO assignment (and Mellanox driver on the guest), I am getting a throughput of about 90Gbps consistently. With the same host A, and a Windows guest on host B with virtio-net-pci device and virtio driver, the throughput drops to about 60Gbps. Therefore, it does appear that the virtio driver is the bottleneck. Where do we go from here? Thanks!

YanVugenfirer commented 7 years ago

Hi,

  1. We are currently are looking at optimization of TX/RX path

  2. There is a known issue of the packets distribution and RSS. There is a certain implementation that is aimed to make Windows certification happy - it is not necessarily the best implementation performance wise.

sameehj commented 7 years ago

Install_Debug.zip Hi @vbusireddy,

I have compiled and attached a version of NetKVM which should have better performance than the one we currently have, can you please test it and report the results?

vbusireddy commented 7 years ago

Hi! Thank you for working on this. I installed the new driver, but there is no noticeable difference in the performance. It is still in the range of 65-70Gbps. I have attached the screenshot of the driver details, to confirm that I am using the correct driver.

image

YanVugenfirer commented 7 years ago

BTW: can you please share the performance numbers of Linux VMs on your setup?

On Oct 2, 2017, at 21:51, vbusireddy notifications@github.com wrote:

Hi! Thank you for working on this. I installed the new driver, but there is no noticeable difference in the performance. It is still in the range of 65-70Gbps. I have attached the screenshot of the driver details, to confirm that I am using the correct driver.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

vbusireddy commented 7 years ago

Oh. I had posted those results on my first post. Between 2 fedora 25 guests (one guest on each host), I get about 95Gbps.

sameehj commented 7 years ago

@vbusireddy Thanks for reporting the results,

I have added another possible optimization, make sure you have vectors = 2 * #numberofqueues + 2. Please test it and report the results,

Thanks =)

Install_Debug.zip

vbusireddy commented 7 years ago

Hi! I ran the tests with the new version (v2) of the driver. Also ran the same tests with the .141 version of the driver for the sake of comparison. In both cases, I ran the test 5 times, and shown below are three results (discarding the best and worst results). Of course, I followed your suggestion about MSI vectors. Since I have 16 queues, I used 40 vectors.

The performance is a little poorer with the new (v2) driver.

68.1, 69.6, 70.2 are the 3 results runs with 141 driver. 68.7, 67.6, 65.0 are the 3 results with the v2 driver.

sameehj commented 7 years ago

@vbusireddy Hi,

The performance is a little poorer with the new (v2) driver.

I see, i think that the following patch should give a boost in performance. I've created two implementations for it. Please test these two versions and update me on your findings :)

Win8.1 array implementation.zip Win8.1 list implementation.zip

vbusireddy commented 7 years ago

@sameehj: Hi, I repeated the tests with both these drivers. By the way, these tests are performed from local host (sender) to a remote Windows guest (receiver). I am doing this because I am seeing the performance degradation only on the receive path of the driver. I ran the test 5 times, and discarded the two extreme results. I am listing the results (all in Gbps) from all the tests for quick comparison.

68.1, 69.6, 70.2 with .141 driver. 68.7, 67.6, 65.0 with v2 driver. 64.3, 69.5, 68.1 with v3 - Array Implementation. 65.8, 66.1, 67.8 with v4 - List Implementation.

For further comparison:

More than 96Gbps from local host to remote Fedora 25 guest. More than 98Gbps from local host to remote host.

sameehj commented 7 years ago

Do you see high CPU usage on the Transmit side compared to the Receive side??

vbusireddy commented 6 years ago

Sorry I missed this! Didn't get an email notification!!

The transmit side is a 40-core system. I see each of the 16 netperf processes taking up around 7% of the CPU time (in the output of 'top' command). On the receive side (Windows guest), task manager shows a total CPU utilization of 69% of the 16 cores.

vbusireddy commented 6 years ago

Hi! Any further suggestions/thoughts on this?

YanVugenfirer commented 6 years ago

We are preparing several changes. And will be grateful if you could test them as well when they are done:

  1. Moving to lockless queues on data path

  2. Simplifying transmit procedure

  3. Changes in RX\RSS DPC rescheduling

vbusireddy commented 6 years ago

I would be more than happy to test the changes. Please let me know when they are done.

sameehj commented 6 years ago

@vbusireddy Hi,

Can you please test the attached build and update me with the results. In case you are using mtu larger than 1500 please remember to set it in both the device (host_mtu) and in Windows (Advanced Tab).

Win8.1.zip

sameehj commented 6 years ago

@vbusireddy Hi,

Did you try the build attached in the previous comment?

vbusireddy commented 6 years ago

Hi, I haven't received an email for the update from sammehj (about the availability of a new version of the driver)! However, I received an email for the update yesterday! Does anyone happen to know why I do not receive an email for some updates? I had no idea that a new driver is available for testing until yesterday! Thanks, Venu

vbusireddy commented 6 years ago

 

Hi,

 

I was out of office on vacation for the last 6 days. Just got back. Will look at the driver today.

 

Regards,

 

Venu

 

 

From: sameehj [mailto:notifications@github.com] Sent: Tuesday, December 05, 2017 06:55 AM To: virtio-win/kvm-guest-drivers-windows Cc: Venu Busireddy; Mention Subject: Re: [virtio-win/kvm-guest-drivers-windows] netkvm: Performance related question (#167)

 

HYPERLINK "https://github.com/vbusireddy"@vbusireddy Hi,

Did you try the build attached in the previous comment?

— You are receiving this because you were mentioned. Reply to this email directly, HYPERLINK "https://github.com/virtio-win/kvm-guest-drivers-windows/issues/167#issuecomment-349295987"view it on GitHub, or HYPERLINK "https://github.com/notifications/unsubscribe-auth/AethBOoz3ZMIpBzzhNYbEeym-xuKPgMeks5s9T0ggaJpZM4Pg98I"mute the thread. https://github.com/notifications/beacon/AethBFeqhHT0fnMX1MKwreBEWiP2K-Nfks5s9T0ggaJpZM4Pg98I.gif

sameehj commented 6 years ago

Hi, I haven't received an email for the update from sammehj (about the availability of a new version of the driver)! However, I received an email for the update yesterday! Does anyone happen to know why I do not receive an email for some updates? I had no idea that a new driver is available for testing until yesterday!

That's weird, I have no idea why. I have tried to search for similar issues but couldn't find anything interesting.

I was out of office on vacation for the last 6 days. Just got back. Will look at the driver today.

Great please try it and update me.

vbusireddy commented 6 years ago

One of the two systems that have a 100GB cards is fried. Trying to get the motherboard replaced. Will update as soon as the system is up and I run the tests.

sameehj commented 6 years ago

Hi @vbusireddy,

Any progress on the issue? :)

vbusireddy commented 6 years ago

Hi!

Unfortunately, one of the systems is still dead. And sadly, no ETA of the failed motherboard!!

vbusireddy commented 5 years ago

Hi @sameehj,

Sorry it took me so long to get back to you on this. I was busy working on another project.

We will have the hardware (100Gbps) set up ready soon to test the performance. I was wondering if the changes you incorporated into https://github.com/virtio-win/kvm-guest-drivers-windows/files/1463911/Win8.1.zip have made into the mainline code. Do you still want us to test with that last version you gave me, or is there a newer version you would like me to test?

Regards.

sameehj commented 5 years ago

Hi :)

Sorry it took me so long to get back to you on this. I was busy working on another project.

It's okay,

We will have the hardware (100Gbps) set up ready soon to test the performance. I was wondering if the changes you incorporated into https://github.com/virtio-win/kvm-guest-drivers-windows/files/1463911/Win8.1.zip have made into the mainline code. Do you still want us to test with that last version you gave me, or is there a newer version you would like me to test?

Yeah that would be great and yes all of the changes have made it to upstream. Do you want me to provide you with a build?

Regards.

vbusireddy commented 5 years ago

Hi, That would be great! Could you please attach the build? Thanks! Venu

sameehj commented 5 years ago

Install.zip

Please find Win8.1x64 and Win10x64 attached, in the compressed file. How are you planning to test it?

vbusireddy commented 5 years ago

Thanks for the drivers! Test setup is as follows. Two hosts each with a 100Gbps Network Adapter, interconnected using the 100Gbps link. On each host, start a Windows guest, with one virtio device with 8 queues. Set the MTU to 9000 on the guest (and host). Run netperf with 8 threads on each VM, to send/receive data across the guests. I ran my earlier tests on the same setup. When the guests are running Fedora, I was getting a throughput of 90-95% of the link speed between the guests. When the guests are running Windows, the throughput dropped to about 60% of the link speed. Do you have a different setup in mind? Any fine tuning required to improve the performance?

vbusireddy commented 5 years ago

Hi! Apologies for getting back late on this. With holidays and most people being on vacation, getting the resources was difficult. Last month, I was able to install the drivers that you sent me on 11/19 on a 40Gbps setup, and ran the measurements. Windows guests as well as Linux guests were yielding throughput close to the line rate. That was good, and expected, considering the link speed. Today, I tried to install the drivers on a 100Gbps setup, and ran into an issue. Windows kept complaining that there is a problem adding the driver. Looked a little closer, and noticed that the certificate used for signing the driver (from the .cat file) had an expiry date of 12/11/2018. Hence I am not able to install that driver. Could you please rebuild the drivers with a different certificate? Thanks!

vbusireddy commented 5 years ago

Hi Sameeh,

Please ignore the above update. After removing the device from the guest, rebooting the guest, and reinstalling the driver worked. So, there is no need to rebuild the driver.

When I run the netperf between the hosts, I get a throughput over 98.9Gbps. Which is great. Following are the sample results with Linux and Windows guests (all tests run with 32 queues, 80 Vectors, 9000 MTU, and 32 simultaneous netperf processes).

Host A to a Linux VM on Host B: 98.787, 98.907, 98.859 Gbps Linux VM on Host B to Host A: 97.348, 98.194, 97.524 Gbps Linux VM on Host A to Linux VM on Host B: 97.824, 97.787, 97.792 Gbps Host A to Windows 10 VM on Host B: 77.207, 77.710, 76.215 Gbps Windows 10 VM on Host B to Host A: 87.998, 86.798, 85.215 Gbps Windows 10 VM on Host A to Windows 10 VM on Host B: 75.984, 73.332, 80.101 Gbps

As you can see, throughput between Linux VMs is very close to the line rate, but throughput between Windows 10 VMs is 20%-25% below the line rate. Also, virtio driver's send path appears to be more efficient than the receive path. Is it possible to get the receive throughput match at least the transmit throughput? Any parameter that can be tuned?

Thanks,

Venu

YanVugenfirer commented 5 years ago

Hi Venu,

Thanks for running the tests! It looks that we had improvements, but still there is a place for additional work.

Regarding receive side performance being behind the send side performance - our theory is that it happens because of inefficient RSS behavior. We implemented RSS that it could pass WHQL and therefore the driver is completing the interrupts on specific CPU based on the redirection table passed by OS. Unfortunately, "hardware" queues (virto queues) don't act upon this table, so we are rescheduling DPC to conform to WHQL requirements. Sameeh was working on virtio spec addition to supporting passing RSS redirection table to the host and later on enabling completion of receive packets on the right CPU.

https://lists.oasis-open.org/archives/virtio-dev/201805/msg00024.html https://www.mail-archive.com/qemu-devel@nongnu.org/msg559452.html

Best regards, Yan.

JonKohler commented 5 years ago

Did those RSS redirection patches land into qemu yet?

vbusireddy commented 5 years ago

I am not sure I follow the question. These changes are in the Windows virtio drivers. Why would these changes be put into qemu? The Windows driver changes do appear to have been pushed upstream. I can build the virtio drivers from the git repo, and get the same results as I was getting with the patched binaries sent to me by @sameehj.

JonKohler commented 5 years ago

I'm talking about what Yan said on Jan 19 - the patch linked there is to qemu-devel - https://www.mail-archive.com/qemu-devel@nongnu.org/msg559452.html

Jon

YanVugenfirer commented 5 years ago

Not yet.

On Sep 19, 2019, at 21:34, Jon Kohler notifications@github.com wrote:

I'm talking about what Yan said on Jan 19 - the patch linked there is to qemu-devel - https://www.mail-archive.com/qemu-devel@nongnu.org/msg559452.html

Jon

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

JonKohler commented 5 years ago

ok great, yan. Is there any sort of status/progress since then? Just curious, working through a few netkvm performance investigations right now on windows, so wanted to survey the known issues and keep them in mind.

YanVugenfirer commented 5 years ago

Hi Jon,

We had some agreement with MST that we are going to develop first some prototype on the host side that will steer the packets. For a while, there was not so much progress on it, and now we are back working on it.

In any case even with the code in QEMU, there will be needed some work on the host side to enable this feature properly. Ideally with vdpa.

If you have additional issues with performance, let's gather the list. We can discuss it during KVM forum.

Also adding Yuri @ybendito to the discussion.

Best regards, Yan.