zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.9k stars 6.64k forks source link

CONFIG_USB_DEVICE_NETWORK_RNDIS Memory and Stack Overflow Issues on nrf52840 and nrf5340 #79232

Open PureEngineering opened 1 month ago

PureEngineering commented 1 month ago

Background

The CONFIG_USB_DEVICE_NETWORK_RNDIS configuration option enables the USB Remote Network Driver Interface Specification (RNDIS) in Zephyr. This allows a device to appear as a network adapter when connected to a host computer via USB. This functionality is crucial for devices that need to transfer data over a network connection using USB.

This task focuses on resolving memory and stack overflow issues that occur when CONFIG_USB_DEVICE_NETWORK_RNDIS is enabled and the device is subjected to some network traffic. Specifically, we're seeing these issues arise during a 15-minute ping test with 1400-byte packets (ping IP -s 1400).

Problem

When the CONFIG_USB_DEVICE_NETWORK_RNDIS option is enabled and the device is under a light network load (e.g., a long ping test with large packets), it experiences memory and/or stack overflow issues. This can lead to:

This problem impacts users who rely on the RNDIS functionality for stable and reliable network communication over USB.

Goals

  1. Identify and resolve the root cause of the memory and stack overflow issues. This may involve analyzing memory usage, examining stack traces, and reviewing the RNDIS implementation in Zephyr.
  2. Ensure the system remains stable and responsive under sustained network load. The device should be able to handle a 15-minute ping test with 1400-byte packets without crashing or experiencing significant performance degradation.
  3. Optimize memory and stack usage within the RNDIS implementation. This may involve reducing buffer sizes, improving memory management, or refactoring code to minimize stack usage.

Tests

Notes and Links

[00:07:47.717,651] <wrn> usb_rndis: Transmit response queue is not empty
[00:07:50.224,884] <wrn> usb_rndis: Transmit response queue is not empty
[00:07:50.244,781] <err> os: ***** USAGE FAULT *****
[00:07:50.252,563] <err> os:   Stack overflow (context area not valid)
[00:07:50.261,871] <err> os: r0/a1:  0x00000002  r1/a2:  0x0003db6d  r2/a3:  0x00000002
[00:07:50.272,674] <err> os: r3/a4:  0x0003d44d r12/ip:  0x00000002 r14/lr:  0x0003d08f
[00:07:50.283,447] <err> os:  xpsr:  0x0003e800
[00:07:50.290,802] <err> os: Faulting instruction address (r15/pc): 0x00000000
[00:07:50.300,811] <err> os: >>> ZEPHYR FATAL ERROR 2: Stack overflow on CPU 0
[00:07:50.310,821] <err> os: Current thread: 0x200039c8 (usbd_workq)
[00:07:50.319,946] <err> os: Halting system
[00:41:54.614,227] <wrn> usb_rndis: Transmit response queue is not empty
[00:41:54.624,420] <wrn> usb_rndis: Unknown USB packet req 0x0 type 0xa1
[00:41:54.637,878] <inf> usb_device: Clear halt ep 0x82
[00:41:54.655,853] <inf> usb_device: Clear halt ep 0x01
[00:41:54.673,736] <inf> usb_device: Clear halt ep 0x81
[00:41:54.884,002] <wrn> usb_rndis: Transmit response queue is not empty
[00:41:54.896,881] <wrn> usb_rndis: Transmit response queue is not empty
[00:41:54.919,921] <err> usb_rndis: Error assembling packet, drop and start over
[00:41:54.930,267] <err> usb_rndis: Wrong data packet type 0x3039313a
[00:41:54.939,483] <err> usb_rndis: Error parsing RNDIS header
[00:41:54.948,242] <err> usb_rndis: Too small packet len 33
[00:41:54.956,634] <err> usb_rndis: Error parsing RNDIS header
[00:41:57.121,917] <wrn> usb_rndis: Transmit response queue is not empty
[00:41:59.631,164] <wrn> usb_rndis: Transmit response queue is not empty
[00:42:02.136,077] <wrn> usb_rndis: Transmit response queue is not empty
[00:42:04.645,507] <wrn> usb_rndis: Transmit response queue is not empty
...
[00:46:04.051,940] <err> usb_rndis: Not enough pkt buffers, len 205, skip 141
[00:46:04.062,011] <wrn> usb_rndis: Skip 64 bytes out of remaining 141 bytes
[00:46:04.071,990] <wrn> usb_rndis: Skip 64 bytes out of remaining 77 bytes
[00:46:04.081,848] <wrn> usb_rndis: Skip 13 bytes out of remaining 13 bytes
[00:46:04.091,735] <err> usb_rndis: Not enough pkt buffers, len 104, skip 40
[00:46:04.101,715] <wrn> usb_rndis: Skip 40 bytes out of remaining 40 bytes
[00:46:04.111,602] <err> usb_rndis: Not enough pkt buffers, len 104, skip 40
[00:46:04.121,551] <wrn> usb_rndis: Skip 40 bytes out of remaining 40 bytes
[00:46:05.559,326] <err> usb_rndis: Cannot get free buffer
[00:46:08.062,683] <err> usb_rndis: Cannot get free buffer
[00:46:10.576,660] <err> usb_rndis: Cannot get free buffer
[00:46:13.086,944] <err> usb_rndis: Cannot get free buffer
[00:46:15.599,304] <err> usb_rndis: Cannot get free buffer
uart:~$ > net mem
net mem
Fragment length 128 bytes
Network buffer pools:
Address         Total   Avail   Name
0x2000203c      16      0       RX
0x2000207c      16      13      TX
0x20002340      64      36      RX DATA (rx_bufs)
0x20002380      64      62      TX DATA (tx_b
github-actions[bot] commented 1 month ago

Hi @PureEngineering! We appreciate you submitting your first issue for our open-source project. 🌟

Even though I'm a bot, I can assure you that the whole community is genuinely grateful for your time and effort. 🤖💙

nordic-piks commented 1 month ago

Triaged internally

PureEngineering commented 2 weeks ago

I found this old issue. but it has the same errors that I am seeing. I am wondering if the same fix needs to be ported to the nrf usb drivers. https://github.com/zephyrproject-rtos/zephyr/issues/19038

PureEngineering commented 2 weeks ago
[00:02:14.849,334] <dbg> usb_rndis: rndis_status_cb: USB unknown state 9
[00:02:14.861,755] <inf> usb_device: Clear halt ep 0x81
[00:02:14.863,342] <dbg> usb_rndis: rndis_status_cb: USB unknown state 9
[00:02:14.881,622] <dbg> usb_rndis: rndis_status_cb: USB unhandled state: 1
[00:02:14.939,941] <dbg> usb_rndis: rndis_status_cb: USB unhandled state: 1
[00:02:15.060,089] <dbg> usb_rndis: rndis_status_cb: USB device configured
[00:02:15.061,950] <dbg> usb_net: netusb_enable: 
[00:02:15.063,415] <dbg> usb_net: netusb_connect_media: 
[00:02:15.071,685] <dbg> usb_rndis: rndis_class_handler: len 24 req_type 0x21 req 0x0 enabled 1
[00:02:15.073,944] <dbg> usb_rndis: queue_encapsulated_cmd: queued buf 0x2001e758
[00:02:15.075,958] <dbg> usb_rndis: cmd_thread: got buf 0x2001e758
[00:02:15.077,728] <dbg> usb_rndis: handle_encapsulated_cmd: RNDIS type 0x2 len 24 total len 24
[00:02:15.079,986] <dbg> usb_rndis: rndis_init_handle: req_id 0x1
[00:02:15.081,726] <dbg> usb_rndis: rndis_queue_rsp: Queued response pkt 0x2001f188
[00:02:15.083,740] <dbg> usb_rndis: rndis_notify_rsp: count 0
[00:02:15.087,524] <dbg> usb_rndis: rndis_notify_cb: ep 81 size 8
[00:02:15.089,355] <dbg> usb_rndis: rndis_class_handler: len 0 req_type 0xa1 req 0x1 enabled 1
[00:02:15.091,552] <dbg> usb_rndis: handle_encapsulated_rsp: 
[00:02:15.093,688] <dbg> usb_rndis: rndis_class_handler: len 32 req_type 0x21 req 0x0 enabled 1
[00:02:15.095,947] <dbg> usb_rndis: queue_encapsulated_cmd: queued buf 0x2001e758
[00:02:15.097,961] <dbg> usb_rndis: cmd_thread: got buf 0x2001e758
[00:02:15.099,700] <dbg> usb_rndis: handle_encapsulated_cmd: RNDIS type 0x5 len 32 total len 32
[00:02:15.101,959] <dbg> usb_rndis: rndis_set_handle: req_id 0x1 Object ID 0x1010e buf_len 4 buf_offset 20
[00:02:15.104,370] <dbg> usb_rndis: rndis_set_handle: RNDIS_OBJECT_ID_GEN_PKT_FILTER 0xf
[00:02:15.106,475] <dbg> usb_rndis: rndis_queue_rsp: Queued response pkt 0x2001f188
[00:02:15.108,489] <dbg> usb_rndis: rndis_notify_rsp: count 0
[00:02:15.111,511] <dbg> usb_rndis: rndis_notify_cb: ep 81 size 8
[00:02:15.113,342] <dbg> usb_rndis: rndis_class_handler: len 0 req_type 0xa1 req 0x1 enabled 1
[00:02:15.115,570] <dbg> usb_rndis: handle_encapsulated_rsp: 
[00:02:15.118,408] <dbg> usb_rndis: rndis_bulk_out: EP 0x1 status 1 len 64
[00:02:15.120,361] <dbg> usb_rndis: rndis_bulk_out: To assemble 666 bytes, reading 64 bytes
[00:02:15.122,650] <dbg> usb_rndis: rndis_bulk_out: EP 0x1 status 1 len 64
[00:02:15.124,603] <dbg> usb_rndis: rndis_bulk_out: To assemble 602 bytes, reading 64 bytes
[00:02:15.126,861] <dbg> usb_rndis: rndis_bulk_out: EP 0x1 status 1 len 19
[00:02:15.128,784] <dbg> usb_rndis: rndis_bulk_out: To assemble 538 bytes, reading 19 bytes
[00:02:15.131,103] <dbg> usb_rndis: rndis_bulk_out: EP 0x1 status 1 len 64
[00:02:15.133,056] <dbg> usb_rndis: rndis_bulk_out: To assemble 519 bytes, reading 64 bytes
[00:02:15.135,345] <dbg> usb_rndis: rndis_bulk_out: EP 0x1 status 1 len 64
[00:02:15.137,298] <dbg> usb_rndis: rndis_bulk_out: To assemble 455 bytes, reading 64 bytes
[00:02:15.139,556] <dbg> usb_rndis: rndis_bulk_out: EP 0x1 status 1 len 39
[00:02:15.141,510] <dbg> usb_rndis: rndis_bulk_out: To assemble 391 bytes, reading 39 bytes
[00:02:15.143,798] <dbg> usb_rndis: rndis_bulk_out: EP 0x1 status 1 len 64
[00:02:15.145,782] <dbg> usb_rndis: rndis_bulk_out: To assemble 352 bytes, reading 64 bytes
[00:02:15.148,071] <dbg> usb_rndis: rndis_bulk_out: EP 0x1 status 1 len 64
[00:02:15.150,024] <dbg> usb_rndis: rndis_bulk_out: To assemble 288 bytes, reading 64 bytes
[00:02:15.152,282] <dbg> usb_rndis: rndis_bulk_out: EP 0x1 status 1 len 16
[00:02:15.154,235] <dbg> usb_rndis: rndis_bulk_out: To assemble 224 bytes, reading 16 bytes
[00:02:15.156,555] <dbg> usb_rndis: rndis_bulk_out: EP 0x1 status 1 len 64
[00:02:15.158,508] <dbg> usb_rndis: rndis_bulk_out: To assemble 208 bytes, reading 64 bytes
[00:02:15.160,797] <dbg> usb_rndis: rndis_bulk_out: EP 0x1 status 1 len 64
[00:02:15.162,719] <dbg> usb_rndis: rndis_bulk_out: To assemble 144 bytes, reading 64 bytes
[00:02:15.165,008] <dbg> usb_rndis: rndis_bulk_out: EP 0x1 status 1 len 36
[00:02:15.166,961] <dbg> usb_rndis: rndis_bulk_out: To assemble 80 bytes, reading 36 bytes
[00:02:15.169,219] <dbg> usb_rndis: rndis_bulk_out: EP 0x1 status 1 len 64
[00:02:15.171,173] <dbg> usb_rndis: rndis_bulk_out: To assemble 44 bytes, reading 64 bytes
[00:02:15.173,339] <err> usb_rndis: Error assembling packet, drop and start over     <<<<<<<<<<<<<<<<<<<<<<<
[00:02:15.175,354] <dbg> usb_rndis: rndis_clean: 
[00:02:15.176,971] <dbg> usb_rndis: rndis_bulk_out: EP 0x1 status 1 len 64
[00:02:15.178,894] <err> usb_rndis: Wrong data packet type 0x11010000
[00:02:15.180,755] <err> usb_rndis: Error parsing RNDIS header
[00:02:15.182,525] <dbg> usb_rndis: rndis_bulk_out: EP 0x1 status 1 len 3
[00:02:15.184,448] <err> usb_rndis: Too small packet len 3
[00:02:15.186,096] <err> usb_rndis: Error parsing RNDIS header
[00:02:15.187,927] <dbg> usb_rndis: rndis_bulk_out: EP 0x1 status 1 len 64
[00:02:15.189,880] <dbg> usb_rndis: parse_rndis_header: Parsing packet: len 151 payload offset 36 payload len 107
[00:02:15.192,443] <dbg> usb_rndis: rndis_bulk_out: To assemble 151 bytes, reading 64 bytes
[00:02:15.194,732] <dbg> usb_rndis: rndis_bulk_out: EP 0x1 status 1 len 64
[00:02:15.196,685] <dbg> usb_rndis: rndis_bulk_out: To assemble 87 bytes, reading 64 bytes
[00:02:15.198,913] <dbg> usb_rndis: rndis_bulk_out: EP 0x1 status 1 len 23
[00:02:15.200,866] <dbg> usb_rndis: rndis_bulk_out: To assemble 23 bytes, reading 23 bytes
[00:02:15.203,033] <dbg> usb_rndis: rndis_bulk_out: Assembled full RNDIS packet
[00:02:15.204,986] <dbg> usb_net: netusb_recv: Recv pkt, len 107
[00:02:15.206,909] <dbg> usb_rndis: rndis_bulk_out: EP 0x1 status 1 len 64
[00:02:15.208,862] <dbg> usb_rndis: parse_rndis_header: Parsing packet: len 147 payload offset 36 payload len 103
[00:02:15.211,425] <dbg> usb_rndis: rndis_bulk_out: To assemble 147 bytes, reading 64 bytes
[00:02:15.213,745] <dbg> usb_rndis: rndis_bulk_out: EP 0x1 status 1 len 64
[00:02:15.215,667] <dbg> usb_rndis: rndis_bulk_out: To assemble 83 bytes, reading 64 bytes
[00:02:15.217,926] <dbg> usb_rndis: rndis_bulk_out: EP 0x1 status 1 len 19
[00:02:15.219,848] <dbg> usb_rndis: rndis_bulk_out: To assemble 19 bytes, reading 19 bytes
[00:02:15.222,015] <dbg> usb_rndis: rndis_bulk_out: Assembled full RNDIS packet
[00:02:15.223,937] <dbg> usb_net: netusb_recv: Recv pkt, len 103
[00:02:15.225,921] <dbg> usb_rndis: rndis_bulk_out: EP 0x1 status 1 len 64

Diving deeper into the issue. after turning on CONFIG_USB_DEVICE_NETWORK_LOG_LEVEL you can see in the log the exact time the bug is exposed. after this error is shown is the exact time that ping no longer works. no idea for the fix, but this hopefully helps track down the issue easier. Also note, these could be a result of a messed up stack at this point.

jfischer-no commented 2 weeks ago

An application that uses USB and the network stack can be quite complicated. You have to narrow it down by trying to reproduce it with samples in the tree, such as samples/net/zperf. I cannot reproduce it with the zperf sample built with -DEXTRA_CONF_FILE="overlay-netusb.conf" -DCONFIG_USB_DEVICE_NETWORK_ECM=n -DCONFIG_USB_DEVICE_NETWORK_RNDIS=y, using ping IP -s 1400. Also, what version of Zephyr RTOS are you using?