renzibei / flashws

A high-performance WebSocket library optimized for low latency and high throughput.
8 stars 2 forks source link

F-Stack ws_client appears to be slower than legacy one for both latency & throughput #2

Closed kiruchon closed 5 months ago

kiruchon commented 5 months ago

I was running test on a fresh install VM Ubuntu 22.04, AWS 2 physical cores (ht=off) and 8 gb ddr5 ram. My goal was to use fastest possible Websocket client to process large number of messages, so I tested specifically ws_client in flashws with non-dpdk ws_server

Also tbh I'm relatively new to dpdk but as far as it works I think I've configured it correctly according to official docs. I also tried different tso & hz options in config.ini and tried both core 1 and 2.

test_ws_client -DFWS_ENABLE_FSTACK=OFF

./test_ws_client 
Prepare to init fws env
Set host: 10.0.30.176, port: 58600, msg_size: 512, msg_cnt_per_client: 300000,data file path: ./log_data.csv
CpuTimer overhead cycles: 62 cycles, tick per ns: 2.399993
data hash: 12542905821436587693
start to run loop
Avg round trip latency: 28.879 us, throughput rx + tx: 282.66 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
Avg round trip latency: 28.890 us, throughput rx + tx: 282.54 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
Avg round trip latency: 28.114 us, throughput rx + tx: 290.31 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
Avg round trip latency: 27.945 us, throughput rx + tx: 292.06 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
Avg round trip latency: 28.176 us, throughput rx + tx: 289.68 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
Avg round trip latency: 28.315 us, throughput rx + tx: 288.26 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
Avg round trip latency: 28.429 us, throughput rx + tx: 287.11 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
Avg round trip latency: 28.502 us, throughput rx + tx: 286.38 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
Avg round trip latency: 28.551 us, throughput rx + tx: 285.88 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
Avg round trip latency: 28.592 us, throughput rx + tx: 285.48 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
Avg round trip latency: 28.624 us, throughput rx + tx: 285.16 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
Avg round trip latency: 28.673 us, throughput rx + tx: 284.68 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
Avg round trip latency: 28.704 us, throughput rx + tx: 284.37 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
Avg round trip latency: 28.719 us, throughput rx + tx: 284.22 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
Avg round trip latency: 28.738 us, throughput rx + tx: 284.03 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
Avg round trip latency: 28.758 us, throughput rx + tx: 283.84 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
Avg round trip latency: 28.767 us, throughput rx + tx: 283.75 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
Avg round trip latency: 28.782 us, throughput rx + tx: 283.60 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
loop cnt reach TOTAL_MSG_CNT, prepare to end
INFO! write read finish! per msg len = 512 times=300000 sendsum=153600000 recvsum=153600000 cost=8670.297 ms
INFO! round trip latency histogram (ns)
Value,Percentile,TotalCount,1/(1-Percentile)
13311.00,0.000000,2,1.00
28031.00,0.250000,84016,1.33
28415.00,0.500000,152661,2.00
28671.00,0.625000,189793,2.67
29055.00,0.750000,226410,4.00
29439.00,0.812500,246088,5.33
30335.00,0.875000,263699,8.00
31743.00,0.906250,272076,10.67
33279.00,0.937500,281897,16.00
34047.00,0.953125,287027,21.33
35071.00,0.968750,291138,32.00
35839.00,0.976562,293376,42.67
36863.00,0.984375,295431,64.00
38143.00,0.988281,296594,85.33
40191.00,0.992188,297686,128.00
41471.00,0.994141,298261,170.67
43263.00,0.996094,298902,256.00
44287.00,0.997070,299165,341.33
45823.00,0.998047,299421,512.00
47359.00,0.998535,299582,682.67
48895.00,0.999023,299716,1024.00
49919.00,0.999268,299787,1365.33
51455.00,0.999512,299858,2048.00
54271.00,0.999634,299896,2730.67
57855.00,0.999756,299927,4096.00
62719.00,0.999817,299946,5461.33
71679.00,0.999878,299964,8192.00
82943.00,0.999908,299973,10922.67
93695.00,0.999939,299982,16384.00
103935.00,0.999954,299987,21845.33
128511.00,0.999969,299991,32768.00
160767.00,0.999977,299994,43690.67
268287.00,0.999985,299996,65536.00
458751.00,0.999989,299997,87381.33
557055.00,0.999992,299998,131072.00
770047.00,0.999994,299999,174762.67
770047.00,0.999996,299999,262144.00
1228799.00,0.999997,300000,349525.33
1228799.00,1.000000,300000,inf
Msg size(bytes): 512
avg (rx+tx) goodput: 283.4505 Mbps
Latency (us): min: 13.248, P50: 28.415, P99: 38.911, P999: 48.895, P100: 1228.799

test_ws_client -DFWS_ENABLE_FSTACK=ON

sudo ./test_ws_client 
Prepare to init fws env
invalid proc_id:-1, use default 0
[dpdk]: lcore_mask=2
[dpdk]: channel=1
[dpdk]: promiscuous=1
[dpdk]: numa_on=1
[dpdk]: tx_csum_offoad_skip=0
[dpdk]: tso=0
[dpdk]: vlan_strip=1
[dpdk]: idle_sleep=0
[dpdk]: pkt_tx_delay=0
[dpdk]: symmetric_rss=0
[dpdk]: port_list=0
[dpdk]: nb_vdev=0
[dpdk]: nb_bond=0
[pcap]: enable=0
[pcap]: snaplen=256
[pcap]: savelen=33554432
[pcap]: savepath=.
[port0]: addr=10.0.26.70
[port0]: netmask=255.255.240.0
[port0]: broadcast=10.0.31.255
[port0]: gateway=10.0.16.1
[port0]: if_name=enp40s0
[freebsd.boot]: hz=100
[freebsd.boot]: fd_reserve=1024
[freebsd.boot]: kern.ipc.maxsockets=262144
[freebsd.boot]: net.inet.tcp.syncache.hashsize=4096
[freebsd.boot]: net.inet.tcp.syncache.bucketlimit=100
[freebsd.boot]: net.inet.tcp.tcbhashsize=65536
[freebsd.boot]: kern.ncallout=262144
[freebsd.boot]: kern.features.inet6=1
[freebsd.boot]: net.inet6.ip6.auto_linklocal=1
[freebsd.boot]: net.inet6.ip6.accept_rtadv=2
[freebsd.boot]: net.inet6.icmp6.rediraccept=1
[freebsd.boot]: net.inet6.ip6.forwarding=0
[freebsd.sysctl]: kern.ipc.somaxconn=32768
[freebsd.sysctl]: kern.ipc.maxsockbuf=16777216
[freebsd.sysctl]: net.link.ether.inet.maxhold=5
[freebsd.sysctl]: net.inet.tcp.fast_finwait2_recycle=1
[freebsd.sysctl]: net.inet.tcp.sendspace=16384
[freebsd.sysctl]: net.inet.tcp.recvspace=8192
[freebsd.sysctl]: net.inet.tcp.cc.algorithm=cubic
[freebsd.sysctl]: net.inet.tcp.sendbuf_max=16777216
[freebsd.sysctl]: net.inet.tcp.recvbuf_max=16777216
[freebsd.sysctl]: net.inet.tcp.sendbuf_auto=1
[freebsd.sysctl]: net.inet.tcp.recvbuf_auto=1
[freebsd.sysctl]: net.inet.tcp.sendbuf_inc=16384
[freebsd.sysctl]: net.inet.tcp.sack.enable=1
[freebsd.sysctl]: net.inet.tcp.blackhole=1
[freebsd.sysctl]: net.inet.tcp.msl=2000
[freebsd.sysctl]: net.inet.tcp.delayed_ack=1
[freebsd.sysctl]: net.inet.tcp.rfc1323=1
[freebsd.sysctl]: net.inet.udp.blackhole=1
[freebsd.sysctl]: net.inet.ip.redirect=0
[freebsd.sysctl]: net.inet.ip.forwarding=0
[freebsd.sysctl]: net.inet.tcp.functions_default=freebsd
[freebsd.sysctl]: net.inet.tcp.hpts.skip_swi=1
f-stack -c2 -n1 --proc-type=auto 
EAL: Detected CPU lcores: 2
EAL: Detected NUMA nodes: 1
EAL: Auto-detected process type: PRIMARY
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: Probe PCI driver: net_ena (1d0f:ec20) device: 0000:28:00.0 (socket 0)
TELEMETRY: No legacy callbacks, legacy socket not created
lcore: 1, port: 0, queue: 0
create mbuf pool on socket 0
create ring:dispatch_ring_p0_q0 success, 2047 ring entries are now free!
Port 0 MAC:06:F3:7B:8A:9E:EF
Port 0 modified RSS hash function based on hardware support,requested:0x2003ffffc configured:0xc30
RX checksum offload supported
TX ip checksum offload supported
TX TCP&UDP checksum offload supported
TSO is disabled
port[0]: rss table size: 128
ena_rss_hash_set(): Setting RSS hash fields is not supported. Using default values: 0xc30
set port 0 to promiscuous mode error

Checking link statusdone
Port 0 Link Up - speed 0 Mbps - full-duplex
link_elf_lookup_symbol: missing symbol hash table
link_elf_lookup_symbol: missing symbol hash table
Timecounters tick every 10.000 msec
WARNING: Adding ifaddrs to all fibs has been turned off by default. Consider tuning net.add_addr_allfibs if needed
Attempting to load tcp_bbr
tcp_bbr is now available
TCP Hpts created 1 swi interrupt threads and bound 0 to cpus
Timecounter "ff_clock" frequency 100 Hz quality 1
TCP_ratelimit: Is now initialized
enp40s0: No addr6 config found.
enp40s0: Ethernet address: 06:f3:7b:8a:9e:ef
Set host: 10.0.30.176, port: 58600, msg_size: 512, msg_cnt_per_client: 300000,data file path: ./log_data.csv
CpuTimer overhead cycles: 64 cycles, tick per ns: 2.399993
data hash: 12542905821436587693
start to run loop
Avg round trip latency: 38.345 us, throughput rx + tx: 213.12 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
Avg round trip latency: 38.384 us, throughput rx + tx: 212.90 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
Avg round trip latency: 38.400 us, throughput rx + tx: 212.81 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
Avg round trip latency: 38.427 us, throughput rx + tx: 212.66 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
Avg round trip latency: 38.392 us, throughput rx + tx: 212.85 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
Avg round trip latency: 38.389 us, throughput rx + tx: 212.87 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
Avg round trip latency: 38.364 us, throughput rx + tx: 213.01 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
Avg round trip latency: 38.371 us, throughput rx + tx: 212.97 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
Avg round trip latency: 38.379 us, throughput rx + tx: 212.93 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
Avg round trip latency: 38.387 us, throughput rx + tx: 212.88 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
Avg round trip latency: 38.397 us, throughput rx + tx: 212.83 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
Avg round trip latency: 38.405 us, throughput rx + tx: 212.78 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
Avg round trip latency: 38.407 us, throughput rx + tx: 212.77 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
Avg round trip latency: 38.411 us, throughput rx + tx: 212.75 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
Avg round trip latency: 38.410 us, throughput rx + tx: 212.76 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
Avg round trip latency: 38.409 us, throughput rx + tx: 212.76 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
Avg round trip latency: 38.411 us, throughput rx + tx: 212.75 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
Avg round trip latency: 38.413 us, throughput rx + tx: 212.74 Mbit/s, hash value: 12542905821436587693, active fd cnt: 1
loop cnt reach TOTAL_MSG_CNT, prepare to end
INFO! write read finish! per msg len = 512 times=300000 sendsum=153600000 recvsum=153600000 cost=11555.008 ms
INFO! round trip latency histogram (ns)
Value,Percentile,TotalCount,1/(1-Percentile)
30847.00,0.000000,1,1.00
36863.00,0.250000,86930,1.33
37887.00,0.500000,160427,2.00
38399.00,0.625000,188267,2.67
39423.00,0.750000,232997,4.00
39935.00,0.812500,250564,5.33
40447.00,0.875000,263139,8.00
41215.00,0.906250,274857,10.67
41983.00,0.937500,281423,16.00
43007.00,0.953125,286388,21.33
44799.00,0.968750,290975,32.00
46079.00,0.976562,293152,42.67
48127.00,0.984375,295533,64.00
49663.00,0.988281,296578,85.33
51967.00,0.992188,297679,128.00
54271.00,0.994141,298293,170.67
59647.00,0.996094,298832,256.00
68607.00,0.997070,299139,341.33
74751.00,0.998047,299419,512.00
77823.00,0.998535,299577,682.67
80895.00,0.999023,299715,1024.00
82431.00,0.999268,299781,1365.33
84991.00,0.999512,299859,2048.00
86015.00,0.999634,299896,2730.67
88063.00,0.999756,299928,4096.00
90623.00,0.999817,299947,5461.33
98815.00,0.999878,299964,8192.00
102911.00,0.999908,299973,10922.67
107519.00,0.999939,299982,16384.00
111103.00,0.999954,299988,21845.33
115199.00,0.999969,299991,32768.00
121343.00,0.999977,299994,43690.67
162815.00,0.999985,299996,65536.00
405503.00,0.999989,299997,87381.33
448511.00,0.999992,299998,131072.00
544767.00,0.999994,299999,174762.67
544767.00,0.999996,299999,262144.00
700415.00,0.999997,300000,349525.33
700415.00,1.000000,300000,inf
Msg size(bytes): 512
avg (rx+tx) goodput: 212.6870 Mbps
Latency (us): min: 30.720, P50: 37.887, P99: 50.431, P999: 80.895, P100: 700.415
renzibei commented 5 months ago

Hi, can you share more config details of the test? Are you using two vm instances or one instance? Is the tls enabled or not? Can you share your constants.h in the test directory? Did you use igb_uio driver with wc_activate enabled? Also, have you enabled the Release mode when building? In my experience, tso can improve the latency.