zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.83k stars 6.6k forks source link

Poor TCP performance #23302

Closed xhpohanka closed 2 years ago

xhpohanka commented 4 years ago

I continue playing with Zephyr network stack and STM32 and I unfortunately found next issue. With nucleo_f429zi board and big_http_download sample I got very slow download speed. This pushed me to check network performace with zperf.

For UDP transfers I got around 10Mbps but for TCP the result was only 10kbps which is really bad.

I tried if some older versions of Zephyr behaves better - fortunately v2.0.0 got me also around 10Mbps for TCP in zperf. With bisecting i found that this issue starts with d88f25bd763e2dfa70873b3c2321f2f8677d643d.

I hoped that reverting it will solve also the slow big_http_download but surprisingly the download speed is still suspiciously low. I will continue to investigate this tomorrow.

I do not know if these issues are related just with STM32 platform. I have just mentioned nucleo_f429zi and custom board with STM32F750 which has slightly different ethernet peripheral and my driver is written also using HAL. Both behaves in same way.

The issues I met so far with Zephyr networking stack pose a question to me if it is mature enough for production?

jukkar commented 4 years ago

Indeed, there is a regression with the TCP throughput. Problem with TCP has been that currently we have no proper tests for TCP that would catch these kind of regression issues. This has been one of the reasons we have been building a new version of the TCP stack that would support proper testing that could be integrated into sanitychecker. The TCP2 has been cooking quite long time and we are slowly getting there where it would be useful, but we are not yet there. Obviously the 10kb/sec is not good and this needs to be fixed.

jukkar commented 4 years ago

I just compiled samples/net/sockets/dumb_http_server_mt for Atmel sam-e70 board and got following numbers:

ab -n100 http://192.0.2.1:8080/
This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 192.0.2.1 (be patient).....done

Server Software:        
Server Hostname:        192.0.2.1
Server Port:            8080

Document Path:          /
Document Length:        2084 bytes

Concurrency Level:      1
Time taken for tests:   0.381 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      214000 bytes
HTML transferred:       208400 bytes
Requests per second:    262.67 [#/sec] (mean)
Time per request:       3.807 [ms] (mean)
Time per request:       3.807 [ms] (mean, across all concurrent requests)
Transfer rate:          548.95 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       1
Processing:     1    3  20.7      1     208
Waiting:        1    3  20.7      1     208
Total:          1    4  20.6      2     208

Percentage of the requests served within a certain time (ms)
  50%      2
  66%      2
  75%      2
  80%      2
  90%      2
  95%      2
  98%      3
  99%    208
 100%    208 (longest request)

So definitely more than 10kb/sec. Then using wrk

./wrk -d 20 -t 24 -c 500 --latency http://192.0.2.1:8080
Running 20s test @ http://192.0.2.1:8080
  24 threads and 500 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    10.74ms   62.79ms 616.12ms   97.41%
    Req/Sec     1.74      3.83    20.00     89.80%
  Latency Distribution
     50%    1.29ms
     75%    1.35ms
     90%    4.46ms
     99%  207.79ms
  116 requests in 20.10s, 242.42KB read
  Socket errors: connect 0, read 134, write 0, timeout 0
Requests/sec:      5.77
Transfer/sec:     12.06KB

Now the transfer speed is very poor.

I then increased the CONFIG_NET_TCP_BACKLOG_SIZE=2 and got considerably better results:

./wrk -d 10 -t 2 -c 100 --latency http://192.0.2.1:8080
Running 10s test @ http://192.0.2.1:8080
  2 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.61ms  421.92us   9.90ms   95.03%
    Req/Sec   375.32    153.56   610.00     72.80%
  Latency Distribution
     50%    1.41ms
     75%    1.88ms
     90%    1.98ms
     99%    2.17ms
  4768 requests in 10.01s, 9.73MB read
  Socket errors: connect 0, read 4774, write 0, timeout 0
Requests/sec:    476.10
Transfer/sec:      0.97MB
xhpohanka commented 4 years ago

I have this behavior for STM32F429ZI

CONFIG_NET_TCP_BACKLOG_SIZE=1

$ wrk -d 20 -t 24 -c 500 --latency http://192.0.2.1:8080
Running 20s test @ http://192.0.2.1:8080
  24 threads and 500 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.51ms    1.79ms   8.77ms   89.86%
    Req/Sec     1.86      3.43    10.00     86.96%
  Latency Distribution
     50%    1.93ms
     75%    1.95ms
     90%    6.70ms
     99%    8.77ms
  69 requests in 20.10s, 144.20KB read
  Socket errors: connect 0, read 69, write 0, timeout 0
Requests/sec:      3.43
Transfer/sec:      7.17KB

CONFIG_NET_TCP_BACKLOG_SIZE=2

$ wrk -d 20 -t 24 -c 500 --latency http://192.0.2.1:8080
Running 20s test @ http://192.0.2.1:8080
  24 threads and 500 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   196.18ms   36.93ms 407.71ms   96.32%
    Req/Sec     8.92      4.18    35.00     85.63%
  Latency Distribution
     50%  202.17ms
     75%  202.23ms
     90%  202.32ms
     99%  209.31ms
  896 requests in 20.09s, 1.84MB read
  Socket errors: connect 0, read 898, write 0, timeout 0
Requests/sec:     44.60
Transfer/sec:     93.92KB

I do not understand why latency is so slow.

The previous result was for v2.2.0-rc3, for 2.1.0-rc3 it is even worse... CONFIG_NET_TCP_BACKLOG_SIZE=2

$ wrk -d 20 -t 24 -c 500 --latency http://192.0.2.1:8080
Running 20s test @ http://10.42.0.192:8080
  24 threads and 500 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   168.84ms   74.40ms 220.12ms   82.07%
    Req/Sec     5.78     10.02   100.00     98.40%
  Latency Distribution
     50%  202.34ms
     75%  202.38ms
     90%  202.58ms
     99%  218.44ms
  145 requests in 20.03s, 305.86KB read
  Socket errors: connect 0, read 789, write 0, timeout 0
Requests/sec:      7.24
Transfer/sec:     15.27KB
xhpohanka commented 4 years ago

It also seems to me that issue with big_http_download sample is a bit different. The download is very slow for me with no respect to CONFIG_NET_TCP_BACKLOG_SIZE or reverting d88f25bd763e2dfa70873b3c2321f2f8677d643d commit. Downloading of 52kB sized file lasts 8.5 seconds. You can check my wireshark log https://drive.google.com/open?id=163-v3MlK3Hgc4F47X05GSSJRo3EFpX-s, it is full of [TCP Window Full] packets.

I also checked zperf again on v2.2.0-rc3 and this is what I get with upstream code

$ iperf -c 192.0.2.1
------------------------------------------------------------
Client connecting to 192.0.2.1, TCP port 5001
TCP window size: 45.0 KByte (default)
------------------------------------------------------------
[  3] local 192.0.2.2 port 51802 connected with 192.0.2.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.2 sec  13.6 KBytes  10.9 Kbits/sec

and this with d88f25bd763e2dfa70873b3c2321f2f8677d643d reverted, backlog size has no impact here

$ iperf -c 192.0.2.1
------------------------------------------------------------
Client connecting to 192.0.2.1, TCP port 5001
TCP window size: 45.0 KByte (default)
------------------------------------------------------------
[  3] local 192.0.2.2 port 51778 connected with 192.0.2.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.1 sec  16.1 MBytes  13.4 Mbits/sec

I'm very interested in getting this to work better. I can do more tests and even development if I get some guidance...

jukkar commented 4 years ago

Some fixes for the memory leaks in #23334 The #23246 is related to this issue. The low performance numbers are probably culprit of lot of packets being dropped. Anyway, the memory leaks needs to be fixes first, and then we can see how the performance number behave.

pfalcon commented 4 years ago

For UDP transfers I got around 10Mbps but for TCP the result was only 10kbps which is really bad.

Indeed, there is a regression with the TCP throughput.

I personally never saw other TCP speed figures from Zephyr with frdm_k64f/any of qemu networking drivers. (Well, perhaps I saw 15KBytes/s, but you get the point.) I actually wanted to add dumping of download speed to big_http_download, but decided against that ;-).

pfalcon commented 4 years ago

I'm currently working on switching over big_http_download to a local host downloads (yep, to put it into CI and not rely on external hosts which would provide additional disturbance to testing).

And I decided to share what I see with just an asciinema recording: https://asciinema.org/a/BXbcuYTQsrsPxDmdCnbfUz7Qz (as I'm not sure how soon they garbage-collect uploads, also attaching here big_http_download-localhost.zip ).

So, as you can see, it starts slow. Then at ~70Kb (~00:19 on the cast) it finally start to work "as it should", than at ~140KB (~00:21) it breaks again. You've already read my explanation of what happens (based on hours of peering into writeshark output (but months ago)): it starts mis-synchronized with Linux TCP sending rexmits and bumps to a large backoff delay, then manages to synchronize to Linux (this is literally the first time I see it), then loses the sync again and goes back to swamp for the rest of download.

pfalcon commented 4 years ago

(Which makes me think maybe add a calculation of "momentary" speed over e.g. last second or two, and print out each on a separate line, not as with "\r").

jukkar commented 4 years ago

I tested this with nucleo_f767zi and see wildly different numbers which depend on the tool used. ApacheBench gave around 470kb/sec results which is quite reasonable. The wrk tool on the other hand pushes so much stuff to zephyr that it easily runs out of memory, and starts to drop packets. This then affects the performance number a lot. I am lowering the priority of this one until we figure out what are the correct numbers.

github-actions[bot] commented 4 years ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

pfalcon commented 4 years ago

Recently, Zephyr switched to newer, "TCP2" implementation by default. I'm doing its triaging for the 2.4 release (#27876, #27982), ans also would like to record some quick performance data:

dumb_http_server sample, qemu_x86, standard SLIP-based QEMU networking setup, ab -n1000 http://192.0.2.1:8080/: 60.78 req/s.

For reference, with TCP1 (CONFIG_NET_TCP1=y), it's 4.18 req/s.

So, there's a noticeable improvement in connection latency.

pfalcon commented 4 years ago

But not so bright with data transfer throughput. The sample is big_http_download, the rest is as above.

First, reference with CONFIG_NET_TCP1=y: 3108 b/s . That's pretty slow. I remember speeds around 10KB/s. Dunno what's up now, my inet connection is not ideal either.

So, with default TCP2, it's 6154 b/s (mind #27982, the transfer doesn't complete successfully, implied end time was used to calc speed).

pfalcon commented 4 years ago

For reference, trying @nzmichaelh's hack from https://github.com/zephyrproject-rtos/zephyr/issues/26330#issuecomment-656194045 (set NET_TCP_BUF_MAX_LEN to (4*1280)) didn't make a statistical difference to me (on qemu_x86 with SLIP, again).

vandita5 commented 4 years ago

Has anyone looked at the cpu usage when pushing the udp throughput to 10Mbps? I am able to go up to 8.5 Mbps with modified echo_server sample but see ~70% cpu usage. Any tips on how to reduce that? I've modified echo server to receive a message then send a dummy packet of ~1200 bytes.

pfalcon commented 4 years ago

11075 b/s with big_http_download on frdm_k64f.

dleach02 commented 4 years ago

@pfalcon, first, thank you for reporting these metrics. For reference though, when you report measured results, do you have an expected value in mind that we can compare against?

carlescufi commented 4 years ago

@jukkar could you please settle on the benchmarks that we use to measure progress in this issue?

github-actions[bot] commented 3 years ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

kevin0402 commented 3 years ago

@jukkar wondering to know if this issue will be further investigated into or not ? In my test, TCP2 performance is still poor compared to FreeRTOS on the same hardware platform.

@jhqian my board is i.mx rt1060 evb, do you have any suggestion ? have you done any TCP performance test before ?

jukkar commented 3 years ago

wondering to know if this issue will be further investigated into or not ? In my test, TCP2 performance is still poor compared to FreeRTOS on the same hardware platform.

No major investigation is going on atm for the performance issue. One thing we are missing is a consistent and documented test environment i.e., how to test TCP performance, what tools to use etc. It is difficult to get consistent results as people are using different boards and software.

jukkar commented 3 years ago

One thing that certainly affects TCP performance in zephyr is that currently Nagle algorithm and delayed acks are not implemented. There are certainly areas of improvement here.

kevin0402 commented 3 years ago

One thing that certainly affects TCP performance in zephyr is that currently Nagle algorithm and delayed acks are not implemented. There are certainly areas of improvement here.

Yes, I believe that would be some of key factors. Is there any plan to do this in next release V2.5 ?

jukkar commented 3 years ago

Is there any plan to do this in next release V2.5 ?

I have not seen any plans for enhancements in this area. Patches are welcome if you want to improve the code.

jukkar commented 3 years ago

Just did a small test, and the TCP performance was really bad. I only got 2.5kb/sec for qemu with echo-server/client. After a quick investigation with wireshark, there is a 200ms timeout in linux side. Did not investigate yet why there is a timeout here. This used to work much better earlier.

pfalcon commented 3 years ago

This used to work much better earlier.

For me, it always worked like that (with occasional very short streaks of non-slow performance). The timeout is due to re-xmit timeout, i.e. there's some weird mis-synchronization between Z and Linux, where gradual re-xmit fallback always ends up in the "pit" side of it. (Writing by memory, didn't look into it for a while. Actually heck, I looked yet with TCP1.)

jukkar commented 3 years ago

The timeout is due to re-xmit

Yes, I know. For some reason zephyr drops the packet which leads to this 200ms timeout (this is probably Nagle hitting in linux side). The TCP2 was working much better last time I looked, dunno what has changed now.

jukkar commented 3 years ago

For my quick qemu test, a packet from linux is not received by zephyr which leads to the 200ms timeout.

[00:05:06.800,000] <dbg> eth_e1000.e1000_tx: iow32 TDT 0x00000001
[00:05:06.800,000] <dbg> eth_e1000.e1000_tx: tx.sta: 0x01
[00:05:06.800,000] <dbg> eth_e1000.e1000_isr: ior32 ICR 0x00000043
[00:05:06.800,000] <dbg> eth_e1000.e1000_rx: rx.sta: 0x07
[00:05:06.800,000] <dbg> net_tcp.tcp_in: (rx_q[0]): ACK Seq=2221158581 Ack=1023855135 Len=0 [ESTA~
[00:05:06.800,000] <dbg> net_tcp.tcp_in: (rx_q[0]): Lowering send window from 64203 to 2730
[00:05:06.800,000] <dbg> net_tcp.tcp_in: (rx_q[0]): conn: 0x15eb90 len_acked=26
[00:05:06.800,000] <dbg> net_tcp.tcp_in: (rx_q[0]): conn: 0x15eb90 total=0, unacked_len=0, send_win=2730, mss=1460

we should received ACK here

[00:05:06.800,000] <dbg> eth_e1000.e1000_isr: ior32 ICR 0x00000040
[00:05:06.800,000] <dbg> eth_e1000.e1000_rx: rx.sta: 0x07
[00:05:06.800,000] <dbg> net_tcp.tcp_in: (rx_q[0]): conn: 0x15eb90 send_data_timer=1, send_data_retries=0
[00:05:06.800,000] <dbg> net_tcp.tcp_unsent_len: (rx_q[0]): unsent_len=0
[00:05:06.800,000] <dbg> net_tcp.tcp_in: (rx_q[0]): ACK,PSH Seq=2221159117 Ack=1023855135 Len=536~
[00:05:06.800,000] <dbg> net_tcp.tcp_in: (rx_q[0]): Lowering send window from 64203 to 2730

we drop this ACK,PSH because seq does not match and currently zephyr does not support packet reordering

the next ACK msg should have received earlier but disappeared, so linux resends it after 200ms

[00:05:07.030,000] <dbg> eth_e1000.e1000_isr: ior32 ICR 0x00000040
[00:05:07.030,000] <dbg> eth_e1000.e1000_rx: rx.sta: 0x07
[00:05:07.030,000] <dbg> net_tcp.tcp_in: (rx_q[0]): ACK Seq=2221158581 Ack=1023855135 Len=536 [ES~
[00:05:07.030,000] <dbg> net_tcp.tcp_in: (rx_q[0]): Lowering send window from 64203 to 2730
[00:05:07.030,000] <dbg> net_tcp.tcp_out_ext: (rx_q[0]): ACK Seq=1023855135 Ack=2221159117 Len=0
[00:05:07.030,000] <dbg> net_tcp.tcp_send_process_no_lock: (rx_q[0]): ACK Seq=1023855135 Ack=2221159117 Len=0 
[00:05:07.030,000] <dbg> net_tcp.tcp_send: (rx_q[0]): ACK Seq=1023855135 Ack=2221159117 Len=0
[00:05:07.030,000] <dbg> eth_e1000.e1000_tx: iow32 TDT 0x00000001
[00:05:07.030,000] <dbg> eth_e1000.e1000_tx: tx.sta: 0x01
[00:05:07.030,000] <dbg> eth_e1000.e1000_isr: ior32 ICR 0x00000043
[00:05:07.030,000] <dbg> eth_e1000.e1000_rx: rx.sta: 0x07

So system worked ok, but the performance was very poor because of the lost packets. I will try this same thing with real hw to see if the situation changes.

jukkar commented 3 years ago

With sam-e70 board, I got 75kb/sec which is not great either.

github-actions[bot] commented 3 years ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

mdkf commented 2 years ago

I'm seeing poor performance with TCP and UDP on both the K64F and STM32. Any plans on addressing this issue?

carlescufi commented 2 years ago

I'm seeing poor performance with TCP and UDP on both the K64F and STM32. Any plans on addressing this issue?

Can you please provide more info as to how you are testing and measuring, so we can reproduce?

mdkf commented 2 years ago

I'm seeing poor performance with TCP and UDP on both the K64F and STM32. Any plans on addressing this issue?

Can you please provide more info as to how you are testing and measuring, so we can reproduce?

Hi Carles, Here's the code I tested it with. https://github.com/mdkf/ZephyrTCPSlow I get 2Mbps with the K64F and and a hang on STM32F4 and STM32F7. I have attached a Wireshark capture of the hang. Zephyr STM32.pcapng.gz

carlescufi commented 2 years ago

I get 2Mbps with the K64F and and a hang on STM32F4 and STM32F7. I have attached a Wireshark capture of the hang.

Thanks!

Can you please open a separate issue for the "hang"? Regarding the throughput on K64F, I would ask @dleach02 or @danieldegrasse to take a look first see if they can identify a potential issue in the ethernet driver itself.

Also, our TCP implementation currently lacks RX window handling, which makes throughput suffer.

dleach02 commented 2 years ago

I think the open question is what is the expected throughput?

As @carlescufi comments, we appear to have some limitations in the TCP implementation. Is this captured anywhere in a ticket? I removed the stale label on this because I do think it is important for Zephyr to ensure that our TCP stack is robust and performant when compared to the alternative solutions.

rlubos commented 2 years ago

Yes, there is an enhancement issue to implement Receive Window handling (https://github.com/zephyrproject-rtos/zephyr/issues/39132) - had to reopen it as it got closed by a wrongly linked PR.

mdkf commented 2 years ago

Yes, there is an enhancement issue to implement Receive Window handling (#39132) - had to reopen it as it got closed by a wrongly linked PR.

That would help, but I don't think it explains the issue I am seeing.

chyu313 commented 2 years ago

Yes, there is an enhancement issue to implement Receive Window handling (#39132) - had to reopen it as it got closed by a wrongly linked PR.

@jukkar had a pull request for receive window update https://github.com/zephyrproject-rtos/zephyr/pull/32076

rlubos commented 2 years ago

Yes, there is an enhancement issue to implement Receive Window handling (#39132) - had to reopen it as it got closed by a wrongly linked PR.

That would help, but I don't think it explains the issue I am seeing.

Unlimited traffic from the peer can end up with packet drop at the driver level, which leads to TCP retransmissions - effectively reducing the throughput. But that's TCP only, if you observe the impact on UDP as well it could be a driver thing.

carlescufi commented 2 years ago

Hi Carles, Here's the code I tested it with. https://github.com/mdkf/ZephyrTCPSlow I get 2Mbps with the K64F and and a hang on STM32F4 and STM32F7. I have attached a Wireshark capture of the hang. Zephyr STM32.pcapng.gz

@mdkf Thanks, just discussed with @rlubos and he will take a look. Could you specify which exact boards you are testing this on please?

carlescufi commented 2 years ago

Yes, there is an enhancement issue to implement Receive Window handling (#39132) - had to reopen it as it got closed by a wrongly linked PR.

@jukkar had a pull request for receive window update #32076

@rlubos has now completed the implementation of RX window support and it has been merged: #43018

dleach02 commented 2 years ago

@carlescufi, @rlubos do you have any testing data that shows before/after performance with the RX window support?

rlubos commented 2 years ago

@dleach02 I'm going to order a few Ethernet boards to perform tests with zperf/iperf on an actual hardware. In a meantime, I've tested how does the recv window affect the traffic using a native_posix platform, with the following results (zperf sample acting as a TCP server, flooding data with iperf from the host machine) :

  1. Running the sample in the default configuration gives the following result (not great):
    
    $ iperf -l 1K -V -c 2001:db8::1 -p 5001
    ------------------------------------------------------------
    Client connecting to 2001:db8::1, TCP port 5001
    TCP window size: 85.0 KByte (default)
    ------------------------------------------------------------
    [  3] local 2001:db8::2 port 39930 connected with 2001:db8::1 port 5001
    [ ID] Interval       Transfer     Bandwidth
    [  3]  0.0-10.1 sec   117 KBytes  94.5 Kbits/sec

2. I've noticed however, that `ethenet_native_posix` driver introduces an artificial delay on the RX path (https://github.com/zephyrproject-rtos/zephyr/blob/main/drivers/ethernet/eth_native_posix.c#L388) - reducing the delay size already improves the throughput w/o changes in the sample configuration. That's why I'm eager to test with an actual HW, where there should be no such delays as the data reception should be interrupt driven:

$ iperf -l 1K -V -c 2001:db8::1 -p 5001

Client connecting to 2001:db8::1, TCP port 5001 TCP window size: 85.0 KByte (default)

[ 3] local 2001:db8::2 port 39940 connected with 2001:db8::1 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.2 sec 315 KBytes 253 Kbits/sec


3. Next, I've tested with the receive window size set to a maximum possible value (note, that this configuration also requires a significant increase of net buffers available to prevent packet drops at driver level). This gives a significant boost in the throughput:

$ iperf -l 1K -V -c 2001:db8::1 -p 5001

Client connecting to 2001:db8::1, TCP port 5001 TCP window size: 85.0 KByte (default)

[ 3] local 2001:db8::2 port 39944 connected with 2001:db8::1 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.1 sec 30.8 MBytes 25.7 Mbits/sec


4. Finally, I've tested with the `CONFIG_NET_TC_RX_COUNT=0` config additionally - this configuration allows to skip the RX queue between driver/net stack, effectively reducing the number of thread rescheduling in the system. Again, this gives a significant boost on the native_posix platform:

$ iperf -l 1K -V -c 2001:db8::1 -p 5001

Client connecting to 2001:db8::1, TCP port 5001 TCP window size: 85.0 KByte (default)

[ 3] local 2001:db8::2 port 39946 connected with 2001:db8::1 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 1.04 GBytes 896 Mbits/sec



Again, I'm eager to test with the actual hardware, but I believe that the above results gives hope that decent results can be expected.
dleach02 commented 2 years ago

I would compare with before and after adding this feature. We have started looking at this by testing 3.0 vs tip of main and then we started tweaking the parameters. In general, we are not seeing good performance results so trying to figure out how to triage this some more.

ping @hakehuang

mdkf commented 2 years ago

As the UDP performance is also poor, I think the issue lies beyond the TCP window size. On the NXP iMXRT1050 sending UDP packets as fast as possible got me 11.9 Mbps, on Zephyr, while it reached 78.5 Mbps on Mbed.

dleach02 commented 2 years ago
  1. I've noticed however, that ethenet_native_posix driver introduces an artificial delay on the RX path (main/drivers/ethernet/eth_native_posix.c#L388) - reducing the delay size already improves the throughput w/o changes in the sample configuration. That's why I'm eager to test with an actual HW, where there should be no such delays as the data reception should be interrupt driven:

This is weird to have a delay like that or that big.

dleach02 commented 2 years ago

As the UDP performance is also poor, I think the issue lies beyond the TCP window size. On the NXP iMXRT1050 sending UDP packets as fast as possible got me 11.9 Mbps, on Zephyr, while it reached 78.5 Mbps on Mbed.

@mdkf what test do you use to gather these numbers?

mdkf commented 2 years ago

@mdkf what test do you use to gather these numbers? I uploaded my test code here: https://github.com/mdkf/ZephyrTCPSlow Swap lines 107 and 108 to test with UDP instead.

hakehuang commented 2 years ago

<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns="http://www.w3.org/TR/REC-html40">

TCP  Performance (Kbps) | v3.0.0 | v3.0.0-post |   -- | -- | -- | -- default config | 1570 | 11.3 |     |   |   |     |   |   |     |   |   |   TCP Performance on v3.0.0-post (Kbps) | default | update to 5120 | update to 65525 TCP performance data | 11.3 | 20.4 | 42.8

hakehuang commented 2 years ago

<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns="http://www.w3.org/TR/REC-html40">

Iperf test suite | Lwip + freertos (rt1060) | Zephyr (rt1060) | Zephyr (qemu_x86) -- | -- | -- | -- TCP v3.0.0 without tcp win change | buntu@ubuntu-Latitude-E6420:~$ sudo iperf -l 1K -V -c 192.168.0.102 -p 5001 ------------------------------------------------------------ Client connecting to 192.168.0.102, TCP port 5001 TCP window size: 85.0 KByte (default) ------------------------------------------------------------ [  3] local 192.168.0.101 port 52216 connected with 192.168.0.102 port 5001 [ ID] Interval       Transfer     Bandwidth [  3]  0.0-10.0 sec   113 MBytes  94.8 Mbits/sec   ubuntu@ubuntu-Latitude-E6420:~$ sudo iperf -s -l 1K -V -B 192.168.0.100 ------------------------------------------------------------ Server listening on TCP port 5001 Binding to local address 192.168.0.100 TCP window size:  128 KByte (default) ------------------------------------------------------------ [  4] local 192.168.0.100 port 5001 connected with 192.168.0.102 port 49154 [ ID] Interval       Transfer     Bandwidth [  4]  0.0-10.0 sec   113 MBytes  94.5 Mbits/sec | buntu@ubuntu-Latitude-E6420:~/net-tools$ sudo iperf -l 1K -V -c 192.0.2.1 -p 5001 ------------------------------------------------------------ Client connecting to 192.0.2.1, TCP port 5001 TCP window size: 85.0 KByte (default) ------------------------------------------------------------ [  3] local 192.0.2.2 port 56022 connected with 192.0.2.1 port 5001 [ ID] Interval       Transfer     Bandwidth [  3]  0.0-10.0 sec  8.39 MBytes  7.04 Mbits/sec | sudo src/iperf -l 1K  -c 192.0.2.1 -p 5001 ------------------------------------------------------------ Client connecting to 192.0.2.1, TCP port 5001 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [  1] local 192.0.2.2 port 56064 connected with 192.0.2.1 port 5001 (icwnd/mss/irtt=6/640/690) [ ID] Interval       Transfer     Bandwidth [  1] 0.00-22.82 sec  4.27 MBytes  1.57 Mbits/sec TCP latest with tcp win change |   | buntu@ubuntu-Latitude-E6420:~/net-tools$ sudo iperf -l 1K -V -c 192.0.2.1 -p 5001 ------------------------------------------------------------ Client connecting to 192.0.2.1, TCP port 5001 TCP window size: 85.0 KByte (default) ------------------------------------------------------------ [  3] local 192.0.2.2 port 56024 connected with 192.0.2.1 port 5001 ^C^C[ ID] Interval       Transfer     Bandwidth [  3]  0.0-95.5 sec  26.8 KBytes  2.30 Kbits/sec | main line default   ubuntu@ubuntu-Latitude-E6420:~/code/iperf2-code$ sudo src/iperf -l 1K  -c 192.0.2.1 -p 5001 ------------------------------------------------------------ Client connecting to 192.0.2.1, TCP port 5001 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [  1] local 192.0.2.2 port 56066 connected with 192.0.2.1 port 5001 (icwnd/mss/irtt=5/597/7963) [ ID] Interval       Transfer     Bandwidth [  1] 0.00-20.37 sec  28.0 KBytes  11.3 Kbits/sec   Change default settings   CONFIG_NET_TCP_MAX_RECV_WINDOW_SIZE=5120   CONFIG_NET_PKT_TX_COUNT=6 CONFIG_NET_BUF_RX_COUNT=64   buntu@ubuntu-Latitude-E6420:~/code/iperf2-code$ sudo src/iperf -l 1K  -c 192.0.2.1 -p 5001 ------------------------------------------------------------ Client connecting to 192.0.2.1, TCP port 5001 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [  1] local 192.0.2.2 port 56068 connected with 192.0.2.1 port 5001 (icwnd/mss/irtt=14/1460/8753) [ ID] Interval       Transfer     Bandwidth [  1] 0.00-20.62 sec  51.4 KBytes  20.4 Kbits/sec   With CONFIG_NET_TCP_MAX_RECV_WINDOW_SIZE=65535   maximum parameterbuntu@ubuntu-Latitude-E6420:~/code/iperf2-code$ sudo src/iperf -l 1K  -c 192.0.2.1 -p 5001 ------------------------------------------------------------ Client connecting to 192.0.2.1, TCP port 5001 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [  1] local 192.0.2.2 port 56070 connected with 192.0.2.1 port 5001 (icwnd/mss/irtt=14/1460/12649) [ ID] Interval       Transfer     Bandwidth [  1] 0.00-20.46 sec   107 KBytes  42.8 Kbits/sec

rlubos commented 2 years ago

@mdkf what test do you use to gather these numbers?

I uploaded my test code here: https://github.com/mdkf/ZephyrTCPSlow Swap lines 107 and 108 to test with UDP instead.

Why is there a sleep within a send loop (https://github.com/mdkf/ZephyrTCPSlow/blob/main/src/socketTest.c#L159)? 2 ms sleep betwen each datagram sent doesn't sound like a best idea for a througput measuring test.

Additionally, 90 byte payload size in datagram seems pretty small fro throughput measurement, did you send such a small packets with mbed as well?