yasukata / tinyhttpd-lwip-dpdk

A tiny HTTP server built on lwIP and DPDK
Apache License 2.0
31 stars 7 forks source link

Linux TCP is faster than lwIP on DPDK #7

Open 98hq opened 5 days ago

98hq commented 5 days ago

Hello,

First of all, thank you for making this project available, it is very interesting. I followed the steps in README to reproduce the experiment, but in my system, the speed of Linux TCP is sometimes faster than lwIP on DPDK. The lwip and dpdk versions I used are the same as yours. My system is Ubuntu 22.04, and the kernel information is

Linux hq-virtual-machine 6.5.0-35-generic #35~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue May 7 09:00:52 UTC 2 x86_64 x86_64 x86_64 GNU/Linux.

The command I used to create huge pages is:

sudo ./dpdk/dpdk-22.03/usertools/dpdk-hugepages.py -p 2M -r 64M

My memory information is

free -m
total used free shared buff/cache available
Mem: 7896 985 5802 42 1108 6614
Swap: 3897 0 3897

The command line I used to start the app is:

sudo LD_LIBRARY_PATH=./dpdk/install/lib/x86_64-linux-gnu ./app -l 0-1 --proc-type=primary --file-prefix=pmd1 --vdev=net_tap001,iface=tap001 --no-pci -- -a 10.0.0.2 -g 10.0.0.1 -m 255.255.255.0 -l 1 -p 10000

After starting, I used wrk to test in another terminal. The test command and output are as follows:

$ wrk http://10.0.0.2:10000/ -d 10 -t 1 -c 1 -L
Running 10s test @ http://10.0.0.2:10000/
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   540.90us  691.38us  12.14ms   94.68%
    Req/Sec     2.22k   533.24     3.67k    69.00%
  Latency Distribution
     50%  319.00us
     75%  647.00us
     90%    1.05ms
     99%    3.72ms
  22072 requests in 10.01s, 1.33MB read
Requests/sec:   2205.95
Transfer/sec:    135.72KB

$ wrk http://10.0.0.2:10000/ -d 10 -t 2 -c 2 -L
Running 10s test @ http://10.0.0.2:10000/
  2 threads and 2 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   329.88us  510.08us  11.30ms   93.68%
    Req/Sec     3.82k     1.05k    5.39k    77.61%
  Latency Distribution
     50%  168.00us
     75%  338.00us
     90%  603.00us
     99%    2.46ms
  76450 requests in 10.10s, 4.59MB read
Requests/sec:   7571.48
Transfer/sec:    465.82KB

$ wrk http://10.0.0.2:10000/ -d 10 -t 4 -c 4 -L
Running 10s test @ http://10.0.0.2:10000/
  4 threads and 4 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   592.37us    0.98ms  15.39ms   94.40%
    Req/Sec     2.40k   426.68     3.53k    73.75%
  Latency Distribution
     50%  333.00us
     75%  505.00us
     90%    0.96ms
     99%    5.56ms
  95446 requests in 10.03s, 5.73MB read
Requests/sec:   9519.57
Transfer/sec:    585.68KB

I use the server code you provided, and the server startup parameters are as follows:

./server -p 5555 -l 1

The results of testing with wrk in another terminal are as follows:

$wrk http://localhost:5555/ -d 10 -t 1 -c 1 -L
Running 10s test @ http://localhost:5555/
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   599.80us    1.91ms  29.92ms   96.63%
    Req/Sec     3.20k     2.47k   13.74k    87.00%
  Latency Distribution
     50%  299.00us
     75%  485.00us
     90%  664.00us
     99%    9.66ms
  31846 requests in 10.03s, 1.91MB read
Requests/sec:   3173.97
Transfer/sec:    195.27KB

$ wrk http://localhost:5555/ -d 10 -t 2 -c 2 -L
Running 10s test @ http://localhost:5555/
  2 threads and 2 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.23ms    6.09ms  68.07ms   91.26%
    Req/Sec     3.03k     2.01k    8.84k    66.84%
  Latency Distribution
     50%  259.00us
     75%  609.00us
     90%    6.63ms
     99%   30.45ms
  59652 requests in 10.03s, 3.58MB read
Requests/sec:   5949.70
Transfer/sec:    366.05KB

$ wrk http://localhost:5555/ -d 10 -t 4 -c 4 -L
Running 10s test @ http://localhost:5555/
  4 threads and 4 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   590.52us    1.69ms  34.60ms   97.20%
    Req/Sec     2.67k   514.67     3.57k    67.25%
  Latency Distribution
     50%  270.00us
     75%  469.00us
     90%  814.00us
     99%    8.35ms
  106260 requests in 10.03s, 6.38MB read
Requests/sec:  10595.40
Transfer/sec:    651.87KB

The relevant parameters of the device are:

./dpdk/dpdk-22.03/usertools/dpdk-devbind.py -s

Network devices using kernel driver
===================================
0000:02:01.0 '82545EM Gigabit Ethernet Controller (Copper) 100f' if=ens33 drv=e1000 unused= *Active*

From the comparison of the above two results, we can see that when Concurrent connections are 1 and 4, the speed of Linux TCP is higher than lwIP on DPDK. Only when Concurrent connections are 2, the speed of lwIP on DPDK is higher than Linux TCP. I don't understand why this is the case. Did I miss an option in the steps?

yasukata commented 3 days ago

Thank you for your message and detailed explanation of your setup.

From the comparison of the above two results, we can see that when Concurrent connections are 1 and 4, the speed of Linux TCP is higher than lwIP on DPDK.

I guess it is because, in the setup you described, lwIP on DPDK works on a tap device (specified in the command --vdev=net_tap001,iface=tap001) which generally imposes networking performance overhead while the Linux TCP stack case performs networking between the benchmark server and client processes without going through a tap device.

To highlight the performance merit of lwIP on DPDK, I think we may need to use physical NICs, which can achieve tens of Gbps throughput, to connect the benchmark server and client processes.

Thank you very much for your interest.