mTCP slower than Linux stack for short messages?

mtcp-stack / mtcp

mTCP: A Highly Scalable User-level TCP Stack for Multicore Systems

Other

1.98k stars 436 forks source link

mTCP slower than Linux stack for short messages? #228

Closed tbarbette closed 5 years ago

tbarbette commented 5 years ago

Hi,

I'm using WRK to simulate 128 parallel clients that generates HTTP request for 8K files in loop towards an HTTP server, without keep-alive (hence each connection is somehow different). WRK use TCP_NODELAY.

With the server running epserver+dpdk I get around 100MB/s with both clients and servers at ease regarding CPU. While with nginx I get around 1GB/s, a limitation due to CPU (my reason to move to mTCP). This is with 100G NICs connected back to back, so we're at ease there too. Hence, this is a timing/pure protocol problem.

Here is a wireshark comparison. On top it's with nginx, below mTCP epserver.

Screenshot from 2019-03-25 13-30-42

2 possible reasons:

epserver deliver the content in more chunks, hence taking more RTT to complete the download.
epserver have extra messages for teardown.

Can I tune any of those two? And do you think it's the problem at hand?

Thanks

tbarbette commented 5 years ago

I could advance on the problem. The initial window is correct, 10 packets, but the inital cwnd is somehow ignored. Hence, when epserver tries to send 8K, it sends only 2K. In https://github.com/mtcp-stack/mtcp/blob/75a9e93343cc198040cfa9fda191e343c5411e48/mtcp/src/tcp_out.c#L517 the length is reduced to 2920 even if the initial window and sending buffer is bigger.

ajamshed commented 5 years ago

Can you please share the following:

1- The command line arguments of epserver. 2- Contents of epserver.conf file.

tbarbette commented 5 years ago

epserver

mkdir /tmp/nginx
dd if=/dev/urandom of=/tmp/nginx/bin-8K bs=1K count=8
sudo ./epserver -f epserver.conf -p /tmp/nginx -N 8

Note that I intend to use -N 18 as the CPU has 18 cores, but I wanted to rule out the core first.

epserver.conf (uncommented lines)

num_mem_ch = 6
port = dpdk1
rcvbuf = 65536
sndbuf = 65536
tcp_timeout = 15
tcp_timewait = 0
stat_print = dpdk1

If you want to try yourself:

wrk -c 128 -t 16 -d10s http://10.221.0.5:80/bin-8K

Thanks!

eunyoung14 commented 5 years ago

Hi,

The issues about the congestion window have been fixed by PR #209 (the CCP integration), and it's available in the devel branch. But it's still weird because we normally could saturate 10G links with 8 cores when requesting 8K files even without the help from NIC segmentation offload. Could you try using smaller buffer size like 8192 or 16384 rather than 65536?

tbarbette commented 5 years ago

I'm not sure to understand everything about the CCP integration but I'm on the devel branch (because of the other issue). Should I do something to get my default cwnd to 10 packets? I'll try 16384 (8192 will not fit 8K with headers) and report back. Thanks! Tom

tbarbette commented 5 years ago

The buffer did not change anything (the server has 256G of RAM). For the overall throughput, I had a TC deflay of 5ms left from a previous experiment. But even without it mTCP is still below Linux for a "small" number of concurrent connections because of this congestion window problem, taking more (smaller this time) RTT to complete a transaction.

eunyoung14 commented 5 years ago

Thanks for the update. It looks like it's related to the following part of the code.

https://github.com/mtcp-stack/mtcp/blob/cc7f751656560e6c922595c9e430f26be4252081/mtcp/src/tcp_in.c#L847

How does it work if that part is replaced with (sndvar->mss * TCP_INIT_CWND): sndvar->mss);?

tbarbette commented 5 years ago

Yes, that solves the window problem. And I've got now a much better efficiency than Linux ;)

eunyoung14 commented 5 years ago

Great! I'll make a quick patch for this issue. Thanks!