Poor Performance using iperf test

Ddnirvana commented 7 years ago

hello, I am trying to use and test the performance of netmap. I have two machines both with 10G NIC(intel x450), and they are connected with cable directly. I use ubuntu 16.04 with kernel 4.4.15 on both machines. I configure these two machines using netmap:


$sudo insmod ./netmap.ko
$sudo insmod ./ixgbe/ixgbe.ko
$sudo ethtool -K enp1s0f0 tx off rx off gso off tso off gro off lro off
$sudo ethtool -s enp1s0f0 autoneg off

And firstly I test the performance of netmap using pkt-gen


//tx side
sudo ./build-apps/pkt-gen/pkt-gen -i enp1s0f0 -f tx 
//rx side
sudo ./build-apps/pkt-gen/pkt-gen -i enp1s0f0 -f rx

The performance is very good , about 14.503Mpps And then I test the performance of netmap using iperf The ifconfig result in two machines:


//server side
enp1s0f0  Link encap:Ethernet  HWaddr 00:1b:21:bb:07:60  
          inet addr:10.0.0.1  Bcast:10.0.0.15  Mask:255.255.255.240
          inet6 addr: fe80::21b:21ff:febb:760/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:146 errors:0 dropped:0 overruns:0 frame:0
          TX packets:268 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:29831 (29.8 KB)  TX bytes:42075 (42.0 KB)

//client side
enp1s0f0  Link encap:Ethernet  HWaddr 00:1b:21:bb:07:48  
          inet addr:10.0.0.2  Bcast:10.0.0.15  Mask:255.255.255.240
          inet6 addr: fe80::21b:21ff:febb:748/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:163 errors:0 dropped:0 overruns:0 frame:0
          TX packets:294 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:35102 (35.1 KB)  TX bytes:47053 (47.0 KB)

And the test command and results


//server side
dd@pc1:~/test/$ iperf -s -u -i 1
------------------------------------------------------------
Server listening on UDP port 5001
Receiving 1470 byte datagrams
UDP buffer size:  208 KByte (default)
------------------------------------------------------------
[  3] local 10.0.0.1 port 5001 connected with 10.0.0.2 port 33616
[ ID] Interval       Transfer     Bandwidth        Jitter   Lost/Total Datagrams
[  3]  0.0- 1.0 sec  90.2 MBytes   756 Mbits/sec   0.004 ms 4083/68406 (6%)
[  3]  0.0- 1.0 sec  19 datagrams received out-of-order
[  3]  1.0- 2.0 sec  93.8 MBytes   787 Mbits/sec   0.006 ms  497/67411 (0.74%)
[  3]  2.0- 3.0 sec  91.5 MBytes   768 Mbits/sec   0.005 ms 2246/67517 (3.3%)
[  3]  3.0- 4.0 sec  95.7 MBytes   803 Mbits/sec   0.004 ms  611/68888 (0.89%)
[  3]  4.0- 5.0 sec  95.1 MBytes   798 Mbits/sec   0.004 ms 1169/69029 (1.7%)
[  3]  5.0- 6.0 sec  94.8 MBytes   795 Mbits/sec   0.005 ms  872/68477 (1.3%)
[  3]  6.0- 7.0 sec  95.9 MBytes   804 Mbits/sec   0.005 ms   12/68404 (0.018%)
[  3]  7.0- 8.0 sec  96.3 MBytes   808 Mbits/sec   0.004 ms    0/68667 (0%)
[  3]  8.0- 9.0 sec  96.5 MBytes   809 Mbits/sec   0.006 ms    0/68813 (0%)
[  3]  0.0-10.0 sec   940 MBytes   789 Mbits/sec   0.006 ms 12336/682783 (1.8%)
[  3]  0.0-10.0 sec  20 datagrams received out-of-order


//client side
dd@ddPC2:~$ iperf -c 10.0.0.1 -u -i 1 -b 10000M
------------------------------------------------------------
Client connecting to 10.0.0.1, UDP port 5001
Sending 1470 byte datagrams
UDP buffer size:  208 KByte (default)
------------------------------------------------------------
[  3] local 10.0.0.2 port 33616 connected with 10.0.0.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec  95.9 MBytes   804 Mbits/sec
[  3]  1.0- 2.0 sec  94.3 MBytes   791 Mbits/sec
[  3]  2.0- 3.0 sec  94.6 MBytes   794 Mbits/sec
[  3]  3.0- 4.0 sec  96.8 MBytes   812 Mbits/sec
[  3]  4.0- 5.0 sec  96.9 MBytes   813 Mbits/sec
[  3]  5.0- 6.0 sec  95.9 MBytes   804 Mbits/sec
[  3]  6.0- 7.0 sec  95.9 MBytes   804 Mbits/sec
[  3]  7.0- 8.0 sec  96.0 MBytes   805 Mbits/sec
[  3]  8.0- 9.0 sec  96.5 MBytes   809 Mbits/sec
[  3]  9.0-10.0 sec  94.5 MBytes   792 Mbits/sec
[  3]  0.0-10.0 sec   957 MBytes   803 Mbits/sec
[  3] Sent 682784 datagrams
[  3] Server Report:
[  3]  0.0-10.0 sec   940 MBytes   789 Mbits/sec   0.005 ms 12336/682783 (1.8%)
[  3]  0.0-10.0 sec  20 datagrams received out-of-order

The result is even worse than the original one..... When I print the demsg log, the contents are very normal and seems nothing wrong... Also, I have checked previous issues and common problems in the Readme but I still can not figure out the reason... Could you please help me ? Any advices will be very appreciated!

vmaffione commented 7 years ago

Hi, I may have misunderstood your report, but from what you describe in your iperf tests you are not using netmap. You are testing the linux tcp stack with offloadings disabled (that's why you are slower), but netmap never comes into the picture..

Ddnirvana commented 7 years ago

(⊙o⊙)…opps..you're right... Actually I meant to test the performance of ptnetmap. I follow the instructions you proposed in issue #240 . I start two virtual machines using ptnetmap, and then test the performance using pkt-get and iperf. But the result shows the performance of pkt-gen is very well(about 20+Mpps), but result of iperf only has about 800Mbits/s(but in #240, lyric got 22.5Gbps...).... So I test the iperf performance using two machines with netmap(obviously it's a stupid try....) Could you please give me some advice about using iperf in virtual machines with ptnetmap? thanks for your reply again!

vmaffione commented 7 years ago

Don't worry. To use ptnetmap you can follow the setup described in #240, and you don't have to disable any offloadings on the "ptnet" interfaces inside the VMs (the ones that ethtool -i ethX reports as "ptnetmap-guest-drivers"). The offloadings are necessary to get a decent TCP throughput, if you are using TCP. You can also use "netperf" to do TCP and UDP tests, if you wish.

Disabling the offloadings is needed in general with netmap when you are using the host (sw) rings (typically when using the bridge example), but this is not your case.

Ddnirvana commented 7 years ago

Thanks for your advice! I find out that the reason is iperf udp test problem. I use iperf udp test on the original 10G network driver and the result is poor too. Then when I change the command iperf -u -c 10.0.0.1 -i 1 -b 100G to iperf -u -c 10.0.0.1 -i 1 -b 100G -l 50k(set the length manually), the sender will gain near 10G bandwidth while the receiver's performance is very very low(270Mbits/s)..And when I use this command in the virtual machines(with ptnetmap), I can get 30Gbits in sender side while very very low performance in receiver side.

So now I change the test to iperf tcp test:iperf -c 10.0.0.1 -i 1, which works fine in native 10G driver. However this command will hang in virtual machines with ptnetmap....


$iperf -c 192.168.1.1 -i 1 -m
//hang.....

Also I have tried to use iperf3 and netperf, they hang too.... Do you know the reasons about these?

vmaffione commented 7 years ago

Disclaimer: I'm not familiar with iperf/iperf3, I use netperf.

If you are doing UDP tests, it's perfectly fine to have high throughput (and packet rate) on the sender and a very low throughput (and packet rate) on the receiver. This phenomenon is known as "receiver livelock" and it's a known scheduling problem in operating systems, so it's not a bug (and in fact you hit the problem either using netmap/ptnetmap or not). Basically, the receiver O.S. spends too much time for the kernel-space processing part, and the userspace application has not enough CPU time to drain the socket receive buffer; most of the received packets are dropped because the socke receive buffer overflows.

The hang may be an issue. Are you able to terminate the hanging application with ctrl-c? Does the kernel crashes (in the host? in the vm?) or reports error in the log? When the hangs happens using netperf, what netperf command lines are you using on the client and the server? Is it the client or the server hanging, or both?

Ddnirvana commented 7 years ago

@vmaffione thanks for your explanation about "receiver livelock"! Yes I can terminate the hanging application with ctrl-c , the kernel does not crash and there are nothing weird in the log... I use netperf -H 10.0.0.1 -l 60 -p 30001 The output of the client shows that establish control: are you sure there is a netserver listening on...establish_control could not establish the control connection from.. So I think maybe there are something wrong in my network configuration... I will try again and if I have progress I will let you know!

vmaffione commented 7 years ago

Ok, so the problem is your network configuration. Maybe it's easier if you don't specify the port with -p. Just use the default ports.

Server:

user@machine1 $ netserver  # run the server

Client:

user@machine2 $ netperf -H 10.0.0.$X -t TCP_STREAM

ArchyCho commented 7 years ago

Server-A ( TX ) Server-B ( Bridge ) Server-C ( RX )

When you use pkt-gen (parts from netmap code ) at Server-A and Server-C , it "push" the interface as netmap(ring) mode for packet processing (kernel by-pass) . so your kernel could not read those packets (ie: tcpdump could not capture unless you re-compile netmap enabled libpcap). If you change it into IPERF , it will connected the program as "HOST STACK" , the netmap(ring) function was "UNLOADED" (Back to normal ).

You could not access the netmap(ring) by iperf "UNLESS YOU MODIFY THE IPERF CODE" .

Netmap was designed to by-pass the host stack , not changing the host stack to a "performance mode".

So the problem was not belongs to Server-B (Netmap Bridge).

Ddnirvana commented 7 years ago

@vmaffione It still can not work.... I am trying to read the source code and ... @ArchyCho thanks for your attentions! But it's not about netmap in host but ptnetmap in guest virtual machines...

ArchyCho commented 7 years ago

@Ddnirvana

I think it was not a matter with VM or real host.

If IPERF without modification , the code could not access the netmap ring for packets generation , so the SenderServer and ReceiverServer are not using netmap codes for packets generation.

For more sharing.

pkt-gen -i eth0 you should read as equal to ( pkt-gen -i netmap:eth0 ) iperf -i eth0 is not equal to ( iperf -i netmap:eth0 )

iperf was not able to use the netmap ring for packets processing , is still using linux kernel host stack , so the performance will back to kernel mode.

vmaffione commented 7 years ago

Hi @ArchyCho , you are absolutely right, but you are referring to the first message of this conversation. If you look at the whole conversation afterwards, you'll see that now we are talking about ptnetmap, which is a way to accelerate traditional socket applications (so also iperf) running inside virtual machines. In fact this is basically about a netmap-powered paravirtualized network device (ptnet) for virtual machines.

@Ddnirvana Please try to start from a clean state. (1) restart the two virtual machines, (2) load the netmap module inside those, (3) give an IP address (on the same subnet) to the two ptnet interfaces, (4) bring the interfaces up; (5) from machine 1 try to ping the IP address of the ptnet interface on machine 2.

In a previous post you reported that in this two-VMs ptnetmap setup you were able to get 30Gbps UDP, so you already managed to get it working.

ArchyCho commented 7 years ago

@vmaffione Thanks for correcting my misunderstanding. I will have more test for this.

@Ddnirvana Which driver do you loaded on VM ? ie: e1000 or virtio_net ?

ArchyCho commented 7 years ago

@Ddnirvana My testing machine CPU = E5-2640v1 x2 Host = CentOS 7 Kernel = 3.12.69 with all default from kernel.org VMs = Same ( 4 Cores each with HT ) Host server and VM compiled NETMAP with PTNETMAP enabled.

pkt-gen working with 19Gbits for small packets.

IPERF Test. Server side cmd = iperf -s Results for reference Client side cmd = iperf -c 172.16.18.31 -t 60 -l 60 611Mbps , CPU 1 core 100% Client side cmd = iperf -c 172.16.18.31 -t 60 13.9Gbits , CPU 1 core 100%

We may have some ideas from the following . When I start pkt-gen -i eth0 -f rx at the server side (VM) , all connection dropped ( ie: SSH to VM ) for eth0 . For my tests whatever you use real host or VM , you may lost host stack connection for the interface which put to netmap mode. When I start iperf -s at the server side (VM) , the connection did not lost , that maybe could be read as netmap was not enabled.

So why I said before , whatever you use VM or REAL host , netmap is difference working mode with traditional network stack , if no modification of codes , I think traditional software ( ie: iperf ) will not work .

Although PTNETMAP could connect the VM using netmap mode each other , it still could not change the network stack for linux kernel ( Kernel inside VM ) , that is what professor Luigirzzo said before ( Netmap will not accelerate your TCP ).

If my point was wrong , please share your points of view , thanks all.

Ddnirvana commented 7 years ago

@vmaffione , thanks for your instructions! I changed my VMs from ubuntu to Centos7(just like @ArchyCho 's configuration), and I re-download ptnetp-qemu(from your repo) and netmap(this repo) and rebuild everything, however when I test iperf in the VM, the client still hang and finally print out unable to connect to server, Connection timed out... Also, I noticed I can not use ssh to connect another VM while I can ping that VM..... It's so weird :(

@ArchyCho thanks for your help and your test result!!! I have changed my virtual machines to CentOS 7(just like yours) however I still can not run iperf and netperf successfully... about your ideas, I think PTNEMAP in the VM is just like a special virtual device(device driver), so even though netmap can not accelerate your TCP(which need change codes in kernel), PTNETMAP can accelerate your TCP in your VM because of the special and fast virtual device(ptnetmap).

ArchyCho commented 7 years ago

@Ddnirvana

unable to connect to server, Connection timed out...

Please confirm you have disabled IPTABLES on VMs.

If you do not disabled IPTABLES , pkt-gen will work but not for IPERF . That's another way to see traditional software will not work as netmap mode for ptnetmap. pkt-gen work as netmap mode and will by-pass iptables kernel module , iperf did not.

vmaffione commented 7 years ago

Netma does not accelerate the kernel TCP/IP stack, neither in physical machine or in a VM. But with ptnetmap, it happens that the ptnet driver (running inside the VM) is actually using the passed-throughinterface in netmap mode, even if the VM kernel stack does not think so. More info here: https://wiki.freebsd.org/DevSummit/201609?action=AttachFile&do=view&target=20160923-freebsd-summit-ptnet.pdf

@Ddnirvana , what are the command line that you use to run your VMs?

ArchyCho commented 7 years ago

@vmaffione Thanks for sharing your point of view. But it makes me confused that I rather I am mis-understanding or not.

https://wiki.freebsd.org/DevSummit/201609?action=AttachFile&do=view&target=20160923-freebsd-summit-ptnet.pdf Up to 20 Mpps between differentVMs ## When using the netmap API

Programs inside VMs , when it is not using netmap API , ( ie:iperf is not accessing the interface as netmap:eth0 ) it should not have netmap advantages (ie: kernel by-pass ) .

My point of view is , PTNETMAP which by-pass host kernel ( ie: host network bridge ) , application within VMs will still accessing kernel to cause poor performance .

Am I mis-understanding that PTNETMAP could make all application accessing the netmap ring inside VMs instead of host stack ?

And more , I try to post my setup for reference . Inside VM [root@localhost ~]#dmesg | grep netmap [ 1.084413] 314.796501 [3385] netmap_init run mknod /dev/netmap c 10 59 # error 0 [ 1.084738] netmap: loaded module [ 1.112803] ptnetmap-guest-drivers 0000:00:03.0: irq 40 for MSI/MSI-X [ 1.112824] ptnetmap-guest-drivers 0000:00:03.0: irq 41 for MSI/MSI-X [ 1.113338] 314.825426 [1955] netmap_mem_pt_guest_ifp_add added (ifp=ffff880075bd2000,nifp_offset=0) [ 1.113618] net eth0: netmap queues/slots: TX 1/1024, RX 1/1024 [ 3.340849] 317.052937 [ 756] netmap_update_config configuration changed (but fine) [ 3.341482] 317.053571 [2104] netmap_mem_pt_guest_finalize allocating lut [root@localhost ~]# dmesg | grep ptnet [ 1.112803] ptnetmap-guest-drivers 0000:00:03.0: irq 40 for MSI/MSI-X [ 1.112824] ptnetmap-guest-drivers 0000:00:03.0: irq 41 for MSI/MSI-X [ 1.117018] ptnet_probe: ffff880075bd2840 [ 3.340564] 317.052650 [ 850] ptnet_open ptnet_open: netif_running 1 [ 3.358863] ptnet_open: min_tx_slots = 34 [ 3.358865] ptnet_open: ffff880075bd2840 [ 3.358867] 317.070956 [ 916] ptnet_open Schedule NAPI to flush RX ring #0

HOST Server startup script. /usr/local/qemu-netmap/bin/qemu-system-x86_64 /home/images/Router-T1.qcow2 -enable-kvm -smp 2 -m 2G -vga std -device ptnet-pci,netdev=data1,mac=00:AA:BB:CC:01:01 -netdev netmap,ifname=vale2:3,id=data1,passthrough=on -vnc :10 & /usr/local/qemu-netmap/bin/qemu-system-x86_64 /home/images/Router-T2.qcow2 -enable-kvm -smp 2 -m 2G -vga std -device ptnet-pci,netdev=data2,mac=00:AA:BB:CC:01:02 -netdev netmap,ifname=vale2:4,id=data2,passthrough=on -vnc :11 &

vmaffione commented 7 years ago

Well, it's not a point of view, I'm saying exactly how we implemented it.

With netmap passthrough (ptnetmap), there are two kind of interfaces, as you can see from the figures in the presentation above: the first is the host interface, in your case a VALE port (e.g. vale2:3); then, this host interface is passed-through inside the VM, where it shows up to the O.S. as a "ptnet" interface, e.g. a regular network interface that uses the ptnetmap-guest-drivers. Now, the applications inside the VM can use the ptnet interface in netmap mode (e.g. using pkt-gen), or just use it through regular sockets (e.g. iperf, so that the ptnet interface is NOT in netmap mode).

However, and this is the important point, the ptnet interface is just a "indirection" for the host interface (e.g. the VALE port), which is ALWAYS opened in netmap mode. So even if from the point of view of the VM O.S. the ptnet interface may not be in netmap mode (e.g. when using iperf/netperf), the corresponding host interface is always in netmap mode. This is way ptnetmap indirectly accelerates TCP/UDP for applications running inside VMs, even if the VM kernel stack is unmodified: under the hood, you are using the host interface in netmap mode.

Your command line and log looks ok, you have configured ptnetmap correctly.

ArchyCho commented 7 years ago

@vmaffione Thanks for your sharing and it makes more clear for PTNETMAP. Actually this is the first time I try PTNETMAP :)

My understandings ( for a 10G network card ixgbe ) VM Host without PTNETMAP , will use host stack bridge ( kernel limitation ~1Mpps to share all VMs ) VM Host with PTNETMAP , will use VALE ports to VMs ( Line rate to share all VMs ) VM Guest with PTNETMAP , could access interface as netmap mode for ~20Mpps to VM Host or other VM Guest . VM Guest with PTNETMAP , could use up to ~1Mpps ( kernel limitation ) for traditional softwares A VM Host with 14VM Guests , the VM Host could process ~14Mpps packets with 14VM Guests for every VM Guest using ~1Mpps for any traditional softwares. One VM Guest could not use traditional softwares to generate ~20Mpps .

If any misunderstanding , please clarify , thanks.

vmaffione commented 7 years ago

Your terminology is a bit unclear, but yes, you are correct. Note that when you use ptnetmap, you are using it both in the host and in the guest, meaning that host and guest are cooperating to make ptnetmap work. It does not make sense to say that you are using ptnetmap in the guest and not in the host, nor the other way around.

It's clear from your writing that you are interested in middlebox applications, that may have to deal with huge packet rates. In this case you don't care about using traditional socket applications inside the VM, and that's ok.

There are however other cases where your VM runs a TCP/UDP endpoint (e.g. a web server, which is not a middlebox), and in that case you care about socket applications in VMs --> this is where ptnetmap can help accelerating your TCP.

Ddnirvana commented 7 years ago

@ArchyCho It's not the iptables' problems and I have disabled the Firewalld(which is a replacement of iptable in CentOS7).... @ArchyCho @vmaffione Thanks for both of you! I have find the problem! Previously I start two virtual machines with these two mac addresses: 00:AA:BB:CC:01:01 && 11:AA:BB:CC:01:01. And from Archy's previous comment, I found that mac addresses is the only difference of the command(to run VMs) between me and Archy. So I changed these two mac addresses to 00:AA:BB:CC:01:01 && 00:AA:BB:CC:01:02. And everything is fine now! Now I can get 26Gbits/sec between two virtual machines using iperf!

ArchyCho commented 7 years ago

@Ddnirvana could you show me your iperf command both sides of VMs ?

And I don't think it was related to the mac address of veth. Anyway congratulations for problem fixed and test successfully.

Ddnirvana commented 7 years ago

@ArchyCho In the server side iperf3 -s , and in the client side iperf3 -c 172.16.18.31 -n 100G

ArchyCho commented 7 years ago

@Ddnirvana So your test was done for large packets , not small packets , sorry for my mis-understanding.

vmaffione commented 7 years ago

See, it was just a misconfiguration, using the same MAC address for different interfaces on the same ethernet network is wrong. Yeah, 26 Gbps it's a good number, in line with the expected performance.

vmaffione commented 6 years ago

Hi, ptnetmap instructions are now available at https://github.com/luigirizzo/netmap/blob/master/README.ptnetmap

luigirizzo / netmap

Poor Performance using iperf test #269