Closed xhpohanka closed 4 years ago
Tried this with nucleo_f767zi and I could reproduce the issue with wrk. I did not see similar issue with Apache bench (ab) tool, wrk seems to stress test the target more.
In fact I was able to reproduce it also with Chrome but it was quite difficult to press F5 so fast :)
I am suspecting that when the amount of incoming connections increases, we eventually run out of network buffers in some part of the code. When that happens, the code notices that but does not release some allocated buffers which will lead to buffer leak. I will be investigating this.
@xhpohanka I hopefully managed to fix the memory leak, can you try #23334 and report does it work for you.
Hello @jukkar. Thank you for the support. With your patches and dumb_http_server_mt
on nucleo-f429zi
and default settings everything seems to run correctly. I'm not able to break it with wrk
.
I also tried to increase the value of NET_TCP_BACKLOG_SIZE
and it also works.
However next test with NET_TCP_BACKLOG_SIZE=10
and NET_SAMPLE_NUM_HANDLERS=8
(number of threads serving client requests) shows that there still might be some leak. With such settings there are some unsuccessful data buffer allocations and net mem
shows a leak.
wrk -d 10 -t 24 -c 200 http://192.0.2.1:8080
...
[00:00:28.762,000] <dbg> net_dumb_http_srv_mt_sample.process_tcp: [9] Connection #482 from 192.0.2.10
[00:00:28.762,000] <err> net_dumb_http_srv_mt_sample: Cannot accept more connections
[00:00:28.763,000] <err> net_dumb_http_srv_mt_sample: Cannot accept more connections
[00:00:28.764,000] <err> net_dumb_http_srv_mt_sample: Cannot accept more connections
[00:00:29.036,000] <err> net_pkt: Data buffer (48) allocation failed.
[00:00:29.172,000] <err> net_pkt: Data buffer (1500) allocation failed.
[00:00:29.537,000] <err> net_pkt: Data buffer (48) allocation failed.
[00:00:29.674,000] <err> net_pkt: Data buffer (48) allocation failed.
[00:00:29.678,000] <err> net_pkt: Data buffer (1500) allocation failed.
[00:00:30.038,000] <err> net_pkt: Data buffer (48) allocation failed.
[00:00:30.180,000] <err> net_pkt: Data buffer (48) allocation failed.
[00:00:30.539,000] <err> net_pkt: Data buffer (48) allocation failed.
[00:00:30.579,000] <err> net_pkt: Data buffer (1500) allocation failed.
[00:00:30.668,000] <err> eth_stm32_hal: Failed to obtain RX buffer
[00:00:30.668,000] <err> eth_stm32_hal: Failed to obtain RX buffer
[00:00:30.668,000] <err> eth_stm32_hal: Failed to obtain RX buffer
...
[00:00:30.980,000] <err> eth_stm32_hal: Failed to obtain RX buffer
[00:00:31.040,000] <err> net_pkt: Data buffer (48) allocation failed.
[00:00:31.081,000] <err> net_pkt: Data buffer (48) allocation failed.
[00:00:31.182,000] <err> net_pkt: Data buffer (1500) allocation failed.
uart:~$ net mem
Fragment length 128 bytes
Network buffer pools:
Address Total Avail Name
0x2001bb48 96 96 RX
0x2001bb64 96 94 TX
0x2001bc34 128 128 RX DATA (rx_bufs)
0x2001bc5c 128 126 TX DATA (tx_bufs)
No external memory pools found.
Tried the PR with this config with Atmel SAM-E70 (300 MHz, 2 MB flash, 384 KB SRAM):
cmake -B build \
-DBOARD=sam_e70_xplained . \
-DCONFIG_NET_DEBUG_NET_PKT_ALLOC=y \
-DCONFIG_NET_BUF_POOL_USAGE=y \
-DCONFIG_NET_TCP_BACKLOG_SIZE=10 \
-DCONFIG_KERNEL_SHELL=y \
-DCONFIG_THREAD_MONITOR=y \
-DCONFIG_THREAD_NAME=y \
-DCONFIG_THREAD_STACK_INFO=y \
-DCONFIG_INIT_STACKS=y \
-DCONFIG_NET_PKT_RX_COUNT=200 \
-DCONFIG_NET_PKT_TX_COUNT=200 \
-DCONFIG_NET_BUF_RX_COUNT=300 \
-DCONFIG_NET_BUF_TX_COUNT=300 \
-DCONFIG_LOG=n \
-DCONFIG_NET_LOG=n
./wrk -d 10 -t 24 -c 200 --latency http://192.0.2.1:8080
Running 10s test @ http://192.0.2.1:8080
24 threads and 200 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 14.14ms 16.38ms 671.85ms 99.00%
Req/Sec 49.78 25.60 170.00 73.60%
Latency Distribution
50% 13.44ms
75% 14.69ms
90% 15.65ms
99% 30.36ms
3605 requests in 10.10s, 7.36MB read
Socket errors: connect 0, read 4023, write 0, timeout 0
Requests/sec: 356.94
Transfer/sec: 746.51KB
Tried this with multiple times and results were consistent, did not see any memory leaks.
Please try to increase DCONFIG_NET_SAMPLE_NUM_HANDLERS
For example my config (I had to decrease buffers count due to sram limitation)
cmake -B build \
-DBOARD=nucleo_f429zi . \
-DCONFIG_NET_DEBUG_NET_PKT_ALLOC=y \
-DCONFIG_NET_BUF_POOL_USAGE=y \
-DCONFIG_NET_TCP_BACKLOG_SIZE=10 \
-DCONFIG_KERNEL_SHELL=y \
-DCONFIG_THREAD_MONITOR=y \
-DCONFIG_THREAD_NAME=y \
-DCONFIG_THREAD_STACK_INFO=y \
-DCONFIG_INIT_STACKS=y \
-DCONFIG_NET_PKT_RX_COUNT=100 \
-DCONFIG_NET_PKT_TX_COUNT=100 \
-DCONFIG_NET_BUF_RX_COUNT=150 \
-DCONFIG_NET_BUF_TX_COUNT=150 \
-DCONFIG_LOG=n \
-DCONFIG_NET_LOG=n \
-DCONFIG_NET_SAMPLE_NUM_HANDLERS=8
results in
$ wrk -d 10 -t 24 -c 1000 http://192.0.2.1:8080
Running 10s test @ http://192.0.2.1:8080
24 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.06s 0.00us 1.06s 100.00%
Req/Sec 0.00 0.00 0.00 100.00%
1 requests in 10.07s, 16.12KB read
Socket errors: connect 0, read 7, write 0, timeout 0
Requests/sec: 0.10
Transfer/sec: 1.60KB
$ curl http://192.0.2.1:8080
curl: (52) Empty reply from server
uart:~$ *** Booting Zephyr OS build zephyr-v2.2.0-29-g67549d8d6002 ***
uart:~$ net mem
Fragment length 128 bytes
Network buffer pools:
Address Total Avail Name
0x200200d0 100 100 RX
0x200200ec 100 100 TX
0x2002017c 150 150 RX DATA (rx_bufs)
0x200201a4 150 150 TX DATA (tx_bufs)
No external memory pools found.
uart:~$
uart:~$ net mem
Fragment length 128 bytes
Network buffer pools:
Address Total Avail Name
0x200200d0 100 94 RX
0x200200ec 100 69 TX
0x2002017c 150 144 RX DATA (rx_bufs)
0x200201a4 150 0 TX DATA (tx_bufs)
No external memory pools found.
Or is increasing DCONFIG_NET_SAMPLE_NUM_HANDLERS
wrong configuration?
Increasing num handlers to 8 got slightly better results, but not much. Best result out of 5 was this one
$ ./wrk -d 10 -t 24 -c 200 --latency http://192.0.2.1:8080
Running 10s test @ http://192.0.2.1:8080
24 threads and 200 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 17.61ms 18.91ms 686.52ms 99.21%
Req/Sec 50.11 34.28 250.00 82.86%
Latency Distribution
50% 17.72ms
75% 18.70ms
90% 19.94ms
99% 32.29ms
3949 requests in 10.02s, 8.07MB read
Socket errors: connect 0, read 3949, write 0, timeout 0
Requests/sec: 394.25
Transfer/sec: 824.75KB
No leaks were seen. I will try with Nucleo F767zi next.
There is something fishy with my nucleo board, the transfer rates are not that great.
./wrk -d 10 -t 24 -c 1000 http://192.0.2.1:8080
Running 10s test @ http://192.0.2.1:8080
24 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 172.95ms 340.65ms 1.87s 88.22%
Req/Sec 11.91 14.37 60.00 83.53%
333 requests in 10.10s, 695.92KB read
Socket errors: connect 0, read 375, write 0, timeout 2
Requests/sec: 32.97
Transfer/sec: 68.90KB
Slightly better when having smaller number of connections
./wrk -d 60 -t 24 -c 500 --latency http://192.0.2.1:8080
Running 1m test @ http://192.0.2.1:8080
24 threads and 500 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 126.09ms 244.86ms 1.72s 86.81%
Req/Sec 13.09 13.65 60.00 84.89%
Latency Distribution
50% 33.29ms
75% 50.91ms
90% 690.30ms
99% 765.88ms
2403 requests in 1.00m, 4.90MB read
Socket errors: connect 0, read 2672, write 0, timeout 0
Requests/sec: 39.99
Transfer/sec: 83.57KB
I did not see any memory leaks. I wonder why the numbers are so much smaller with this stm board.
It looks that there is maybe some additional issue on stm32 platform. Unfortunately I do not have any other board to test now. I checked stm32 Ethernet driver (or glue layer between HAL and zephyr) but I do not see any suspicious thing there. Also as far as I know stm32 works well with lwIP so I do not think that there is a bug inside HAL driver...
I did some testing and stm numbers are not good. I used identical configuration options with all the boards.
Board | Details | KB/sec |
---|---|---|
NXP FRDM-K64F | 120 MHz, 1 MB flash, 256 KB RAM | 262.52 |
Atmel SAM-E70 | 300 MHz, 2 MB flash, 384 KB RAM | 751.66 |
STM Nucleo F767zi | 216 MHz, 2 MB flash, 384 KB RAM | 95.70 |
Config in samples/net/sockets/dump_http_server_mt
:
cmake -B build \
-DBOARD=<select-the-board> . \
-DCONFIG_NET_DEBUG_NET_PKT_ALLOC=y \
-DCONFIG_NET_BUF_POOL_USAGE=y \
-DCONFIG_NET_TCP_BACKLOG_SIZE=8 \
-DCONFIG_KERNEL_SHELL=y \
-DCONFIG_THREAD_MONITOR=y \
-DCONFIG_THREAD_NAME=y \
-DCONFIG_THREAD_STACK_INFO=y \
-DCONFIG_INIT_STACKS=y \
-DCONFIG_NET_PKT_RX_COUNT=200 \
-DCONFIG_NET_PKT_TX_COUNT=200 \
-DCONFIG_NET_BUF_RX_COUNT=300 \
-DCONFIG_NET_BUF_TX_COUNT=300 \
-DCONFIG_LOG=n \
-DCONFIG_NET_LOG=n
Host side:
$ ./wrk -d 60 -t 24 -c 500 --latency http://192.0.2.1:8080
Just fyi, I could replicate the memory leak in stm32f767zi by setting the net buf/pkt count low.
Hopefully I managed to nail the nasty net buf leak, basically if the device is flooded with incoming packets, then the tx timer in tcp could trigger and we then tried to send the packet second time even if the previous msg was still pending in TX queue. This caused a memory leak which was only seen if the device is very busy. @xhpohanka can you try the latest version of the PR and report status.
I have tried all previous test and they are working correctly with latest PR. I was able to stress my targets so that there were no free buffers
...
[00:00:30.653,000] <err> eth_stm32_hal: Failed to obtain RX buffer
[00:00:30.778,000] <err> net_pkt: Data buffer (48) allocation failed.
...
and it is recovering from such state without no problem.
...
[00:00:31.973,000] <err> eth_stm32_hal: Failed to obtain RX buffer
[00:00:32.281,000] <err> net_pkt: Data buffer (48) allocation failed.
[00:00:32.455,000] <err> net_pkt: Data buffer (48) allocation failed.
[00:00:32.782,000] <err> net_pkt: Data buffer (48) allocation failed.
$ net mem
Fragment length 128 bytes
Network buffer pools:
Address Total Avail Name
0x2001bb48 96 0 RX
0x2001bb64 96 74 TX
0x2001bc34 128 32 RX DATA (rx_bufs)
0x2001bc5c 128 0 TX DATA (tx_bufs)
No external memory pools found.
[00:00:32.956,000] <err> net_pkt: Data buffer (48) allocation failed.
[00:00:33.283,000] <err> net_pkt: Data buffer (48) allocation failed.
...
$ net mem
Fragment length 128 bytes
Network buffer pools:
Address Total Avail Name
0x2001bb48 96 96 RX
0x2001bb64 96 96 TX
0x2001bc34 128 128 RX DATA (rx_bufs)
0x2001bc5c 128 128 TX DATA (tx_bufs)
No external memory pools found.
Let's hope that leaks are gone :)
The very bad performance could be explained (at least partially) by the small amount of network buffers available. If the device just cannot handle the req, the data is dropped. The wrk tool is really stress testing the device as it floods the system with constant stream of http requests. I got a bit better performance values with apache bench tool, not sure which tool is right and how the values are calculated in each of them. Anyway, if we really got all the leaks gone, then we can probably start to investigate why the perf numbers look like that.
The very bad performance could be explained (at least partially) by the small amount of network buffers available.
I wouldn't agree that the amount of network buffers we have by default (now) is small. To remind, I advocated for another approach: to test (and get working) stuff with as little as possible number of buffers. That would expose many deadlocks/softlocks we have. Then we indeed can throw in more buffers, and everything would "just work" (more perfomantly, apparently).
I got a bit better performance values with apache bench tool, not sure which tool is right and how the values are calculated in each of them.
I proposed to standardize on Apache Bench because it's well-known tool, readily available in any distro (so can reproduce reports you <-> other people). I so far found ab
to be enough to show issues in the stack, and didn't have a need to move beyond it (and there're many tools like wrk
around, how to choose "right").
why the perf numbers look like that.
Mind that there're many "moving parts":
big_http_download
sample). Whereas you talk about using ab
, i.e. Zephyr is server.ab
exposed enough probs from my PoV).big_http_download
). Like, there shouldn't be, but come on, we know how IT works ;-). If you want rational explanation how that may differ, consider for example that the Internet server big_http_download
talks to may have TCP params tuned, making it more susceptible to any missteps in TCP it sees from Zephyr.So, as usual, to investigate something, we need to describe/standardize the setup to test against, and that has always been problem.
Just another note to speed. Even if big_http_download
sample is still slow for me, last patches fixed very slow upload in my application. It was a case when a binary file is sent into zephyr board through http POST request (firmware update).
One config option that might help (not tested), is to make the CONFIG_NET_TCP_INIT_RETRANSMISSION_TIMEOUT
lower. The default value for that is 200ms. In wireshark I noticed when running some tests against Linux host, that it re-sends a packet just before our 200ms timer expires. This leads unnecessary network traffic in this case. So setting CONFIG_NET_TCP_INIT_RETRANSMISSION_TIMEOUT=150
might improve things, I have not tested this yet.
In my investigations (dated by now), I came to conclusion that any retransmissions are controlled by Linux side, and we indeed badly interact with Linux TCP stack, causing these retransmissions to start, and then increase in positive feedback. I believe where I stopped is trying to change some rexmit timeout params on Linux side, to test this hypothesis, and IIRC, found out these params are effectively hardcoded (or can't be controlled precisely). Again, this is all from somewhat vague memory. Definitely +1 for trying to reproduce those matters, just a word of notice that it may be not that easy to see any changes.
Hi using nucleo_767zi, zephyr 2.3.0, just having net/dhcpv4 and mqtt, i still see Failed to obtain RX buffer.
info:prj.conf CONFIG_CPLUSPLUS=y CONFIG_STD_CPP17=y
#
#
CONFIG_NEWLIB_LIBC=y CONFIG_LIB_CPLUSPLUS=y
CONFIG_NET_CONFIG_AUTO_INIT=y CONFIG_NETWORKING=y CONFIG_NET_TCP=y CONFIG_NET_LLDP=y CONFIG_NET_L2_ETHERNET=y CONFIG_NET_L2_ETHERNET_MGMT=y
CONFIG_NET_IPV4=y CONFIG_NET_CONFIG_NEED_IPV4=y
CONFIG_NET_IPV6=n
CONFIG_NET_DHCPV4=y
CONFIG_NET_CONNECTION_MANAGER=y CONFIG_NET_NATIVE=y
CONFIG_NET_MGMT=y CONFIG_NET_MGMT_EVENT=y
CONFIG_NET_SOCKETS=y CONFIG_NET_SOCKETS_POSIX_NAMES=y
CONFIG_LOG=y CONFIG_LOG_RUNTIME_FILTERING=y CONFIG_LOG_PRINTK=y CONFIG_COVERAGE=n CONFIG_LOG_IMMEDIATE=y
CONFIG_ENTROPY_GENERATOR=y CONFIG_TEST_RANDOM_GENERATOR=y
CONFIG_DNS_RESOLVER=y CONFIG_DNS_SERVER_IP_ADDRESSES=y
CONFIG_MQTT_LIB=y
LOG:
Booting Zephyr OS build zephyr-v2.3.0
Hello new board definition on nucleo_f767zi !
[00:00:01.510,000] .[0m
[00:00:02.912,000] .[0m
[00:00:03.176,000] .[0m
[00:00:03.404,000] .[0m
[00:00:10.620,000] .[0m
[00:00:11.027,000] .[0m
[00:00:11.434,000] .[0m
[00:00:11.841,000] .[0m
[00:00:12.248,000] .[0m
[00:00:12.656,000] .[0m
[00:00:13.063,000] .[0m
[00:00:13.470,000] .[0m
[00:00:13.877,000] .[0m
[00:00:14.284,000] .[0m
[00:00:14.691,000] .[0m
[00:00:15.098,000] .[0m
[00:00:15.505,000] .[0m
[00:00:15.912,000] .[0m
[00:00:16.319,000] .[0m
[00:00:16.727,000] .[0m
[00:00:17.134,000] .[0m
[00:00:17.541,000] .[0m
[00:00:17.948,000] .[0m
[00:00:18.355,000] .[0m
[00:00:18.762,000] .[0m
[00:00:19.169,000] .[0m
[00:00:19.576,000] .[0m
[00:00:19.983,000] .[0m
[00:00:20.390,000] .[0m
[00:00:20.798,000] .[0m
[00:00:21.205,000] .[0m
[00:00:21.612,000] .[0m
[00:00:22.019,000] .[0m
[00:00:22.426,000] .[0m
[00:00:22.833,000] .[0m
[00:00:23.240,000] .[0m
[00:00:23.647,000] .[0m
[00:00:24.054,000] .[0m
[00:00:24.461,000] .[0m
[00:00:24.869,000] .[0m
[00:00:25.276,000] .[0m
[00:00:25.683,000] .[0m
[00:00:26.090,000] .[0m
[00:00:26.497,000] .[0m
[00:00:26.904,000] .[1;31m
[00:00:27.207,000] .[1;31m
[00:00:27.513,000] .[1;31m
[00:00:27.818,000] .[1;31m
[00:00:28.539,000] .[0m
[00:00:28.946,000] .[0m
[00:00:29.353,000] .[0m
any idea: i checked and the above commits are in the zephyr.
I'm playing with
samples/net/sockets/dumb_http_server_mt
sample onnucleo-f429zi
board. I have tried several network stack configurations and found out that when I increase number ofNET_TCP_BACKLOG_SIZE
(Number of simultaneous incoming TCP connections) to something higher than 6 and try to flood my board with many requests then some tx buffers are lost and never freed.For network testing I use wrk benchmarking tool with command:
wrk -d 20 -t 24 -c 500 --latency http://192.1.1.145:8080
With NET_TCP_BACKLOG_SIZE=10 I have following log
When I run of buffers completely I got zephyr Usage fault:
Is there any reasonable limit for backlog size? Is it dependent on other settings?
My current enviroment is
v2.1.0-rc3
, linux, zephyr-sdk..config
is following