Open vbusireddy opened 7 years ago
I believe that @sameehj has recently run networking benchmarks and may be able to offer specific tips.
In general:
1) Make sure to configure QEMU to expose the recommended Hyper-V enlightenments per http://blog.wikichoon.com/2014/07/enabling-hyper-v-enlightenments-with-kvm.html
2) Use the latest NetKVM virtio-win driver. Or at least >= 126 so you get RSC support https://github.com/virtio-win/kvm-guest-drivers-windows/wiki/netkvm-RSC-(receive-segment-coalescing)-feature
Hi @vbusireddy,
First of all I'd try to enable the Hyper-v enlightenments and check the driver's version as @ladipro already suggested.
While testing I have noticed that iperf2 gave me much better results than netperf on Windows. When you enable multiqueue remember to add "-P #number_of_queues" to the iperf command line, for enabling multiple streams while testing. In general you should reach similar results to the virtio-net on Linux at least in big packet sizes (4k+).
Please do try these tips and update us on your test results once you have them =)
@ladipro (bullet 1): I am using qemu 2.9.50 to create my Windows 2012 R2 guests. I couldn't see any of the "Hyper-V enlightenment" options (hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time) in qemu 2.9.50! Looked up the qemu source code too, and there is no hint of those options. What are the equivalent options in qemu?
@ladipro (bullet 2): I have used the driver version 0.1.141. That is the latest I see at https://fedorapeople.org/groups/virt/virtio-win/repo/stable/. I have also tried with the drivers built using the latest upstream code (as of Aug 29th, commit ID 825e80c789e6b7ef267feef79d0454a32710a996, from Ladi), and that driver also produces same results as 0.1.141. And I did verify on my guest that RSC is enabled, and 'ethtool' on the host shows the correct configuration as suggested in the RSC link that you referred to.
@sameehj: Indeed, iperf2 produced better results than netperf. However, netperf produces much better results if I spawn multiple independent processes talking to the same remote guest and sum up the individual results. In my tests, iperf2 with "-P 16" yielded about 45-50Gbps, whereas 16 netperf instances yielded a throughput of 65-70Gbps (both tests run with 16 queues, 16 vcpus, and 9000 MTU).
@ladipro (bullet 1): I am using qemu 2.9.50 to create my Windows 2012 R2 guests. I couldn't see any of the "Hyper-V enlightenment" options (hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time) in qemu 2.9.50! Looked up the qemu source code too, and there is no hint of those options. What are the equivalent options in qemu?
The QEMU source code has these options in hyphenated form (hv-relaxed etc.) but the command line takes either. Please try this:
-cpu host,hv_relaxed,hv_spinlocks=0x1fff,hv_vapic,hv_time
I tried both syntaxes. The performance did not change. I am still getting about 65-70Gbps, in both forms of usage.
Any other thoughts/suggestions?
That also brings me back to another question that I asked earlier. Has anyone run benchmarks across a 100Gbps link with the Windows drivers? And what were the results?
I tried both syntaxes. The performance did not change. I am still getting about 65-70Gbps, in both forms of usage. Any other thoughts/suggestions?
Can you perform the same test with a 100G card that is assigned to the Windows guest using vfio assignment? What are the results that you are getting? This test could give us a good baseline for what are the limitations that can be achieved with Windows.
That also brings me back to another question that I asked earlier. Has anyone run benchmarks across a 100Gbps link with the Windows drivers? And what were the results?
No, we haven't done that yet.
Hi! I ran the tests with VFIO assignment. Between host A running Linux, and a Windows guest on host B with VFIO assignment (and Mellanox driver on the guest), I am getting a throughput of about 90Gbps consistently. With the same host A, and a Windows guest on host B with virtio-net-pci device and virtio driver, the throughput drops to about 60Gbps. Therefore, it does appear that the virtio driver is the bottleneck. Where do we go from here? Thanks!
Hi,
We are currently are looking at optimization of TX/RX path
There is a known issue of the packets distribution and RSS. There is a certain implementation that is aimed to make Windows certification happy - it is not necessarily the best implementation performance wise.
Install_Debug.zip Hi @vbusireddy,
I have compiled and attached a version of NetKVM which should have better performance than the one we currently have, can you please test it and report the results?
Hi! Thank you for working on this. I installed the new driver, but there is no noticeable difference in the performance. It is still in the range of 65-70Gbps. I have attached the screenshot of the driver details, to confirm that I am using the correct driver.
BTW: can you please share the performance numbers of Linux VMs on your setup?
On Oct 2, 2017, at 21:51, vbusireddy notifications@github.com wrote:
Hi! Thank you for working on this. I installed the new driver, but there is no noticeable difference in the performance. It is still in the range of 65-70Gbps. I have attached the screenshot of the driver details, to confirm that I am using the correct driver.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
Oh. I had posted those results on my first post. Between 2 fedora 25 guests (one guest on each host), I get about 95Gbps.
@vbusireddy Thanks for reporting the results,
I have added another possible optimization, make sure you have vectors = 2 * #numberofqueues + 2. Please test it and report the results,
Thanks =)
Hi! I ran the tests with the new version (v2) of the driver. Also ran the same tests with the .141 version of the driver for the sake of comparison. In both cases, I ran the test 5 times, and shown below are three results (discarding the best and worst results). Of course, I followed your suggestion about MSI vectors. Since I have 16 queues, I used 40 vectors.
The performance is a little poorer with the new (v2) driver.
68.1, 69.6, 70.2 are the 3 results runs with 141 driver. 68.7, 67.6, 65.0 are the 3 results with the v2 driver.
@vbusireddy Hi,
The performance is a little poorer with the new (v2) driver.
I see, i think that the following patch should give a boost in performance. I've created two implementations for it. Please test these two versions and update me on your findings :)
Win8.1 array implementation.zip Win8.1 list implementation.zip
@sameehj: Hi, I repeated the tests with both these drivers. By the way, these tests are performed from local host (sender) to a remote Windows guest (receiver). I am doing this because I am seeing the performance degradation only on the receive path of the driver. I ran the test 5 times, and discarded the two extreme results. I am listing the results (all in Gbps) from all the tests for quick comparison.
68.1, 69.6, 70.2 with .141 driver. 68.7, 67.6, 65.0 with v2 driver. 64.3, 69.5, 68.1 with v3 - Array Implementation. 65.8, 66.1, 67.8 with v4 - List Implementation.
For further comparison:
More than 96Gbps from local host to remote Fedora 25 guest. More than 98Gbps from local host to remote host.
Do you see high CPU usage on the Transmit side compared to the Receive side??
Sorry I missed this! Didn't get an email notification!!
The transmit side is a 40-core system. I see each of the 16 netperf processes taking up around 7% of the CPU time (in the output of 'top' command). On the receive side (Windows guest), task manager shows a total CPU utilization of 69% of the 16 cores.
Hi! Any further suggestions/thoughts on this?
We are preparing several changes. And will be grateful if you could test them as well when they are done:
Moving to lockless queues on data path
Simplifying transmit procedure
Changes in RX\RSS DPC rescheduling
I would be more than happy to test the changes. Please let me know when they are done.
@vbusireddy Hi,
Can you please test the attached build and update me with the results. In case you are using mtu larger than 1500 please remember to set it in both the device (host_mtu) and in Windows (Advanced Tab).
@vbusireddy Hi,
Did you try the build attached in the previous comment?
Hi, I haven't received an email for the update from sammehj (about the availability of a new version of the driver)! However, I received an email for the update yesterday! Does anyone happen to know why I do not receive an email for some updates? I had no idea that a new driver is available for testing until yesterday! Thanks, Venu
Hi,
I was out of office on vacation for the last 6 days. Just got back. Will look at the driver today.
Regards,
Venu
From: sameehj [mailto:notifications@github.com] Sent: Tuesday, December 05, 2017 06:55 AM To: virtio-win/kvm-guest-drivers-windows Cc: Venu Busireddy; Mention Subject: Re: [virtio-win/kvm-guest-drivers-windows] netkvm: Performance related question (#167)
HYPERLINK "https://github.com/vbusireddy"@vbusireddy Hi,
Did you try the build attached in the previous comment?
— You are receiving this because you were mentioned. Reply to this email directly, HYPERLINK "https://github.com/virtio-win/kvm-guest-drivers-windows/issues/167#issuecomment-349295987"view it on GitHub, or HYPERLINK "https://github.com/notifications/unsubscribe-auth/AethBOoz3ZMIpBzzhNYbEeym-xuKPgMeks5s9T0ggaJpZM4Pg98I"mute the thread. https://github.com/notifications/beacon/AethBFeqhHT0fnMX1MKwreBEWiP2K-Nfks5s9T0ggaJpZM4Pg98I.gif
Hi, I haven't received an email for the update from sammehj (about the availability of a new version of the driver)! However, I received an email for the update yesterday! Does anyone happen to know why I do not receive an email for some updates? I had no idea that a new driver is available for testing until yesterday!
That's weird, I have no idea why. I have tried to search for similar issues but couldn't find anything interesting.
I was out of office on vacation for the last 6 days. Just got back. Will look at the driver today.
Great please try it and update me.
One of the two systems that have a 100GB cards is fried. Trying to get the motherboard replaced. Will update as soon as the system is up and I run the tests.
Hi @vbusireddy,
Any progress on the issue? :)
Hi!
Unfortunately, one of the systems is still dead. And sadly, no ETA of the failed motherboard!!
Hi @sameehj,
Sorry it took me so long to get back to you on this. I was busy working on another project.
We will have the hardware (100Gbps) set up ready soon to test the performance. I was wondering if the changes you incorporated into https://github.com/virtio-win/kvm-guest-drivers-windows/files/1463911/Win8.1.zip have made into the mainline code. Do you still want us to test with that last version you gave me, or is there a newer version you would like me to test?
Regards.
Hi :)
Sorry it took me so long to get back to you on this. I was busy working on another project.
It's okay,
We will have the hardware (100Gbps) set up ready soon to test the performance. I was wondering if the changes you incorporated into https://github.com/virtio-win/kvm-guest-drivers-windows/files/1463911/Win8.1.zip have made into the mainline code. Do you still want us to test with that last version you gave me, or is there a newer version you would like me to test?
Yeah that would be great and yes all of the changes have made it to upstream. Do you want me to provide you with a build?
Regards.
Hi, That would be great! Could you please attach the build? Thanks! Venu
Please find Win8.1x64 and Win10x64 attached, in the compressed file. How are you planning to test it?
Thanks for the drivers! Test setup is as follows. Two hosts each with a 100Gbps Network Adapter, interconnected using the 100Gbps link. On each host, start a Windows guest, with one virtio device with 8 queues. Set the MTU to 9000 on the guest (and host). Run netperf with 8 threads on each VM, to send/receive data across the guests. I ran my earlier tests on the same setup. When the guests are running Fedora, I was getting a throughput of 90-95% of the link speed between the guests. When the guests are running Windows, the throughput dropped to about 60% of the link speed. Do you have a different setup in mind? Any fine tuning required to improve the performance?
Hi! Apologies for getting back late on this. With holidays and most people being on vacation, getting the resources was difficult. Last month, I was able to install the drivers that you sent me on 11/19 on a 40Gbps setup, and ran the measurements. Windows guests as well as Linux guests were yielding throughput close to the line rate. That was good, and expected, considering the link speed. Today, I tried to install the drivers on a 100Gbps setup, and ran into an issue. Windows kept complaining that there is a problem adding the driver. Looked a little closer, and noticed that the certificate used for signing the driver (from the .cat file) had an expiry date of 12/11/2018. Hence I am not able to install that driver. Could you please rebuild the drivers with a different certificate? Thanks!
Hi Sameeh,
Please ignore the above update. After removing the device from the guest, rebooting the guest, and reinstalling the driver worked. So, there is no need to rebuild the driver.
When I run the netperf between the hosts, I get a throughput over 98.9Gbps. Which is great. Following are the sample results with Linux and Windows guests (all tests run with 32 queues, 80 Vectors, 9000 MTU, and 32 simultaneous netperf processes).
Host A to a Linux VM on Host B: 98.787, 98.907, 98.859 Gbps Linux VM on Host B to Host A: 97.348, 98.194, 97.524 Gbps Linux VM on Host A to Linux VM on Host B: 97.824, 97.787, 97.792 Gbps Host A to Windows 10 VM on Host B: 77.207, 77.710, 76.215 Gbps Windows 10 VM on Host B to Host A: 87.998, 86.798, 85.215 Gbps Windows 10 VM on Host A to Windows 10 VM on Host B: 75.984, 73.332, 80.101 Gbps
As you can see, throughput between Linux VMs is very close to the line rate, but throughput between Windows 10 VMs is 20%-25% below the line rate. Also, virtio driver's send path appears to be more efficient than the receive path. Is it possible to get the receive throughput match at least the transmit throughput? Any parameter that can be tuned?
Thanks,
Venu
Hi Venu,
Thanks for running the tests! It looks that we had improvements, but still there is a place for additional work.
Regarding receive side performance being behind the send side performance - our theory is that it happens because of inefficient RSS behavior. We implemented RSS that it could pass WHQL and therefore the driver is completing the interrupts on specific CPU based on the redirection table passed by OS. Unfortunately, "hardware" queues (virto queues) don't act upon this table, so we are rescheduling DPC to conform to WHQL requirements. Sameeh was working on virtio spec addition to supporting passing RSS redirection table to the host and later on enabling completion of receive packets on the right CPU.
https://lists.oasis-open.org/archives/virtio-dev/201805/msg00024.html https://www.mail-archive.com/qemu-devel@nongnu.org/msg559452.html
Best regards, Yan.
Did those RSS redirection patches land into qemu yet?
I am not sure I follow the question. These changes are in the Windows virtio drivers. Why would these changes be put into qemu? The Windows driver changes do appear to have been pushed upstream. I can build the virtio drivers from the git repo, and get the same results as I was getting with the patched binaries sent to me by @sameehj.
I'm talking about what Yan said on Jan 19 - the patch linked there is to qemu-devel - https://www.mail-archive.com/qemu-devel@nongnu.org/msg559452.html
Jon
Not yet.
On Sep 19, 2019, at 21:34, Jon Kohler notifications@github.com wrote:
I'm talking about what Yan said on Jan 19 - the patch linked there is to qemu-devel - https://www.mail-archive.com/qemu-devel@nongnu.org/msg559452.html
Jon
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
ok great, yan. Is there any sort of status/progress since then? Just curious, working through a few netkvm performance investigations right now on windows, so wanted to survey the known issues and keep them in mind.
Hi Jon,
We had some agreement with MST that we are going to develop first some prototype on the host side that will steer the packets. For a while, there was not so much progress on it, and now we are back working on it.
In any case even with the code in QEMU, there will be needed some work on the host side to enable this feature properly. Ideally with vdpa.
If you have additional issues with performance, let's gather the list. We can discuss it during KVM forum.
Also adding Yuri @ybendito to the discussion.
Best regards, Yan.
I have two servers running Linux kernel 4.1.12-103.3.8.el7uek.x86_64 (Oracle Linux 7.4), interconnected with a 100GB ethernet link. When I run netperf between these systems, I get a throughput of >96% of the line rate.
I then created two Fedora 25 KVM guests, one on each server. The setup is: QEMU version 2.9.50 16 vcps and 8GB RAM virtio-net-pci device with 16 queues, 40 vectors, and mq enabled.
Running netperf between these guests also results in a throughput of >95% of the line rate.
However, if I replace the Fedora 25 guests with two Windows 2012 R2 guests, with the same QEMU options, I get a throughput of about 50-60% of the line rate!
Has anyone run benchmarks across 100GB links with Windows virtio drivers? And what kind of throughput is achieved? Isn't 60% of line rate is a bit low? I was hoping to get the same throughput as the Fedora guests! Any tricks to improve the throuput with Windows 2012 guests?
Thanks in advance!
Venu