zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.48k stars 6.41k forks source link

Bluetooth: IPSP Sample Crash on nrf52840dk_nrf52840 #53871

Closed seekaddo closed 9 months ago

seekaddo commented 1 year ago

Hi All, I am new to the zephyr OS and currently using it for an IoT project. I am trying to get the IPSP example working so that i can start from there. The example keeps crashing on connect (echo "connect DC:F2:F1:E7:96:E9 2" > /sys/kernel/debug/bluetooth/6lowpan_control) and I don't know why. Can anyone help? I am working with the latest branch.

Git log on branch

commit 80f87b94806bef4f893b18f1dc8cea070382dd6b (HEAD -> main, origin/main, origin/HEAD) Author: Mariusz Skamra mariusz.skamra@codecoup.pl Date: Thu Jan 12 10:51:19 2023 +0100

To Reproduce

west build -p always -b nrf52840dk_nrf52840 samples/bluetooth/ipsp -DCONF_FILE=prj_dbg.conf
west flash

on Raspi Pi 3B Using this USB BLE dongle: https://www.2direct.de/media/pdf/fb/99/d0/bt0015IJ77DTdbLHOls.pdf

modprobe bluetooth_6lowpan
 echo 1 > /sys/kernel/debug/bluetooth/6lowpan_enable
 echo "connect  <nRF MAC> <type>" > /sys/kernel/debug/bluetooth/6lowpan_control

Logs and console output

[00:00:00.251,953] <dbg> net_core: net_init: (0x20002598): Priority 90
[00:00:00.252,105] <dbg> net_route: net_route_init: (0x20002598): Allocated 8 routing entries (576 bytes)
[00:00:00.252,136] <dbg> net_route: net_route_init: (0x20002598): Allocated 8 nexthop entries (224 bytes)
[00:00:00.252,136] <dbg> net_core: l3_init: (0x20002598): Network L3 init done
[00:00:00.252,258] <dbg> net_if: net_if_init: (0x20002598): 
[00:00:00.252,288] <dbg> net_if: init_iface: (0x20002598): On iface 0x20001354
[00:00:00.252,410] <dbg> net_if: update_operational_state: (0x20002598): iface 0x20001354, oper state DOWN admin DOWN carrier ON dormant ON
[00:00:00.252,471] <dbg> net_if: net_if_ipv6_calc_reachable_time: (0x20002598): min_reachable:15000 max_reachable:45000
[00:00:00.255,096] <dbg> net_if: net_if_post_init: (0x20002598): 
[00:00:00.255,096] <dbg> net_if: net_if_up: (0x20002598): iface 0x20001354
[00:00:00.255,249] <dbg> net_if: update_operational_state: (0x20002598): iface 0x20001354, oper state DORMANT admin UP carrier ON dormant ON
*** Booting Zephyr OS build zephyr-v3.2.0-3560-g80f87b94806b ***
[00:00:00.256,988] <inf> bt_hci_core: hci_vs_init: HW Platform: Nordic Semiconductor (0x0002)
[00:00:00.257,019] <inf> bt_hci_core: hci_vs_init: HW Variant: nRF52x (0x0002)
[00:00:00.257,080] <inf> bt_hci_core: hci_vs_init: Firmware: Standard Bluetooth controller (0x00) Version 3.2 Build 99
[00:00:00.258,026] <inf> bt_hci_core: bt_dev_show_info: Identity: DC:F2:F1:E7:96:E9 (random)
[00:00:00.258,056] <inf> bt_hci_core: bt_dev_show_info: HCI: version 5.3 (0x0c) revision 0x0000, manufacturer 0x05f1
[00:00:00.258,087] <inf> bt_hci_core: bt_dev_show_info: LMP: version 5.3 (0x0c) subver 0xffff
[00:00:00.258,636] <inf> net_config: net_config_init_by_iface: Initializing network
[00:00:00.258,666] <inf> net_config: check_interface: Waiting interface 1 (0x20001354) to be up...
[00:00:30.259,307] <err> net_config: net_config_init_by_iface: Timeout while waiting network interface
[00:00:30.259,338] <err> net_config: net_config_init_app: Network initialization failed (-115)
[00:00:30.259,368] <inf> ipsp: init_app: Run IPSP sample
[00:00:30.259,490] <dbg> net_if: net_if_ipv6_addr_add: (0x20002598): [0] interface 0x20001354 address 2001:db8::1 type MANUAL added
[00:00:30.259,582] <dbg> net_if: net_if_ipv6_maddr_add: (0x20002598): [0] interface 0x20001354 address ff02::1 added
[00:00:30.259,796] <dbg> net_if: net_if_tx: (0x20002598): Processing (pkt 0x2000db1c, prio 1) network packet iface 0x20001354/1
[00:00:30.259,796] <dbg> net_6lo: compress_da_mcast: (0x20002598): M_1 dst is mcast
[00:00:30.259,826] <dbg> net_6lo: compress_da_mcast: (0x20002598): DAM_11 dst maddr 8 bit compressible
[00:00:30.259,826] <dbg> net_6lo: compress_IPHC_header: (0x20002598): SAM_00, SAC_1 unspecified src address
[00:00:30.259,857] <dbg> net_6lo: compress_hoplimit: (0x20002598): HLIM compressed (1)
[00:00:30.259,857] <dbg> net_6lo: compress_tfl: (0x20002598): Traffic class and Flow label elided
[00:00:30.259,887] <err> net_bt: net_bt_send: Unable to send packet: -128
[00:00:30.260,009] <dbg> net_if: net_if_ipv6_maddr_add: (0x20002598): [1] interface 0x20001354 address ff02::1:ff00:1 added
[00:00:30.260,192] <dbg> net_if: net_if_tx: (0x20002598): Processing (pkt 0x2000db1c, prio 1) network packet iface 0x20001354/1
[00:00:30.260,223] <dbg> net_6lo: compress_da_mcast: (0x20002598): M_1 dst is mcast
[00:00:30.260,223] <dbg> net_6lo: compress_da_mcast: (0x20002598): DAM_11 dst maddr 8 bit compressible
[00:00:30.260,253] <dbg> net_6lo: compress_IPHC_header: (0x20002598): SAM_00, SAC_1 unspecified src address
[00:00:30.260,253] <dbg> net_6lo: compress_hoplimit: (0x20002598): HLIM compressed (1)
[00:00:30.260,284] <dbg> net_6lo: compress_tfl: (0x20002598): Traffic class and Flow label elided
[00:00:30.260,314] <err> net_bt: net_bt_send: Unable to send packet: -128
[00:00:30.260,375] <dbg> net_if: net_if_ipv6_start_dad: (0x20002598): Interface 0x20001354 is down, starting DAD for 2001:db8::1 later.
[00:00:30.260,803] <inf> ipsp: listen: Starting to wait
[00:01:08.527,404] <dbg> net_if: net_if_tx: (0x20001f18): Processing (pkt 0x2000dad8, prio 1) network packet iface 0x20001354/1
--- 16 messages dropped ---
[00:01:08.527,435] <dbg> net_6lo: compress_da_mcast: (0x20001f18): M_1 dst is mcast
[00:01:08.527,465] <dbg> net_6lo: compress_da_mcast: (0x20001f18): DAM_01 6 bytes: 2nd byte + last five bytes
[00:01:08.527,465] <dbg> net_6lo: compress_IPHC_header: (0x20001f18): SAM_00, SAC_1 unspecified src address
[00:01:08.527,496] <dbg> net_6lo: compress_hoplimit: (0x20001f18): HLIM compressed (255)
[00:01:08.527,526] <dbg> net_6lo: compress_tfl: (0x20001f18): Traffic class and Flow label elided
[00:01:08.527,679] <dbg> net_if: net_if_ipv6_start_dad: (0x20001f18): Interface 0x20001354 ll addr DC:F2:F1:E7:96:E9 tentative IPv6 addr 2001:db8::1
[00:01:08.527,862] <dbg> net_ipv6: nbr_new: (0x20001f18): nbr 0x20000900 iface 0x20001354/1 state 0 IPv6 2001:db8::1
[00:01:08.528,015] <dbg> net_ipv6: net_ipv6_send_ns: (0x20001f18): Sent Neighbor Solicitation from :: to ff02::1:ff00:1, target 2001:db8::1 iface 0x20001354/1
[00:01:08.528,076] <dbg> net_if: net_if_tx: (0x20001f18): Processing (pkt 0x2000dad8, prio 1) network packet iface 0x20001354/1
[00:01:08.528,076] <dbg> net_6lo: compress_da_mcast: (0x20001f18): M_1 dst is mcast
[00:01:08.528,106] <dbg> net_6lo: compress_da_mcast: (0x20001f18): DAM_01 6 bytes: 2nd byte + last five bytes
[00:01:08.528,137] <dbg> net_6lo: compress_IPHC_header: (0x20001f18): SAM_00, SAC_1 unspecified src [00:01:08.528,167] <dbg> net_6lo: compress_hoplimit: (0x20001f18): HLIM compressed (255)
[00:01:08.528,167] <dbg> net_6lo: compress_tfl: (0x20001f18): Traffic class and Flow label elided
[00:01:08.528,228] <dbg> net_if: net_if_start_rs: (0x20001f18): Starting ND/RS for iface 0x20001354
[00:01:08.528,442] <dbg> net_ipv6: net_ipv6_send_rs: (0x20001f18): Sent Router Solicitation from :: to ff02::2 iface 0x20001354/1
[00:01:08.528,503] <dbg> net_if: net_if_tx: (0x20001f18): Processing (pkt 0x2000dad8, prio 1) network packet iface 0x20001354/1
[00:01:08.528,503] <dbg> net_6lo: compress_da_mcast: (0x20001f18): M_1 dst is mcast
[00:01:08.528,533] <dbg> net_6lo: compress_da_mcast: (0x20001f18): DAM_11 dst maddr 8 bit compressible
[00:01:08.528,564] <dbg> net_6lo: compress_IPHC_header: (0x20001f18): SAM_00, SAC_1 unspecified src address
[00:01:08.528,564] <dbg> net_6lo: compress_hoplimit: (0x20001f18): HLIM compressed (255)
[00:01:08.528,594] <dbg> net_6lo: compress_tfl: (0x20001f18): Traffic class and Flow label elided
[00:01:08.627,685] <dbg> net_if: dad_timeout: (0x20002618): DAD succeeded for fe80::dcf2:f1ff:fee7:96e9
[00:01:08.627,716] <dbg> net_ipv6: nbr_free: (0x20002618): nbr 0x200008ac
[00:01:08.627,746] <dbg> net_ipv6: net_neighbor_data_remove: (0x20002618): Neighbor 0x200008ac removed
[00:01:08.628,875] <dbg> net_if: dad_timeout: (0x20002618): DAD succeeded for 2001:db8::1
[00:01:08.628,906] <dbg> net_ipv6: nbr_free: (0x20002618): nbr 0x20000900
[00:01:08.628,906] <dbg> net_ipv6: net_neighbor_data_remove: (0x20002618): Neighbor 0x20000900 removed
ASSERTION FAIL [buf] @ WEST_TOPDIR/zephyr/subsys/bluetooth/host/conn.c:873
[00:01:08.675,903] <err> os: esf_dump: r0/a1:  0x00000003  r1/a2:  0x00000000  r2/a3:  0x00000011
[00:01:08.675,903] <err> os: esf_dump: r3/a4:  0x200014b0 r12/ip:  0x00000014 r14/lr:  0x00009d81
[00:01:08.675,933] <err> os: esf_dump:  xpsr:  0x61000000
[00:01:08.675,933] <err> os: esf_dump: Faulting instruction address (r15/pc): 0x00009d8c
[00:01:08.675,994] <err> os: z_fatal_error: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
[00:01:08.676,025] <err> os: z_fatal_error: Current thread: 0x20001a10 (unknown)
[00:01:08.908,813] <err> os: k_sys_fatal_error_handler: Halting system

Environment:

seekaddo commented 1 year ago

Hi Guys, I managed to get the IPSP sample working on this commit: 2677520f19698c6a3cf9cbb5dae61b4371e171cf, with a few fix But I cannot send more than 104 without any issues.

The device can receive up to 1222 bytes of ipv6 pkt but it cannot reply with the same packet size. Test: 1 received 104 but timeout when reply:

[00:00:27.807,342] <inf> ipsp: -------------Incoming UDP
[00:00:27.807,403] <inf> ipsp: -------------Build UDP reply
[00:00:27.807,434] <inf> ipsp: UDP IPv6 received 104 bytes
[00:00:27.807,769] <wrn> bt_l2cap: No credits to transmit packet
[00:00:27.807,861] <inf> ipsp: -------------sent UDP ret: 104 dest: fd00:b::2 port: 57591
[00:00:58.160,003] <inf> ipsp: -------------Incoming UDP
[00:00:58.160,064] <inf> ipsp: -------------Build UDP reply
[00:00:58.160,095] <inf> ipsp: UDP IPv6 received 104 bytes
[00:00:58.160,552] <inf> ipsp: -------------sent UDP ret: 104 dest: fd00:b::2 port: 59594
[00:01:28.609,252] <inf> ipsp: -------------Incoming UDP
[00:01:28.609,313] <inf> ipsp: -------------Build UDP reply
[00:01:28.609,344] <inf> ipsp: UDP IPv6 received 104 bytes
[00:01:28.609,802] <inf> ipsp: -------------sent UDP ret: 104 dest: fd00:b::2 port: 35600
[00:01:58.859,222] <inf> ipsp: -------------Incoming UDP
[00:01:58.859,252] <inf> ipsp: -------------Build UDP reply
[00:01:58.859,283] <inf> ipsp: UDP IPv6 received 104 bytes
[00:01:58.859,741] <inf> ipsp: -------------sent UDP ret: 104 dest: fd00:b::2 port: 53332
[00:02:29.109,161] <inf> ipsp: -------------Incoming UDP
[00:02:29.109,222] <inf> ipsp: -------------Build UDP reply
[00:02:29.109,252] <inf> ipsp: UDP IPv6 received 104 bytes
[00:02:29.109,710] <inf> ipsp: -------------sent UDP ret: 104 dest: fd00:b::2 port: 58153
[00:02:59.409,118] <inf> ipsp: -------------Incoming UDP
[00:02:59.409,179] <inf> ipsp: -------------Build UDP reply
[00:02:59.409,210] <inf> ipsp: UDP IPv6 received 104 bytes
[00:02:59.459,197] <inf> ipsp: -------------sent UDP ret: 104 dest: fd00:b::2 port: 38855
[00:08:49.606,384] <err> bt_l2cap: err -12
[00:08:51.606,536] <err> bt_l2cap: chan 0x20000170 timeout

Test: 2 Receive 1222 bytes but cannot respond

*** Booting Zephyr OS build zephyr-v3.2.0-55-g08e81230753e  ***
[00:00:00.263,031] <inf> bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
[00:00:00.263,061] <inf> bt_hci_core: HW Variant: nRF52x (0x0002)
[00:00:00.263,092] <inf> bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 3.2 Build 0
[00:00:00.264,038] <inf> bt_hci_core: Identity: DC:F2:F1:E7:96:E9 (random)
[00:00:00.264,068] <inf> bt_hci_core: HCI: version 5.3 (0x0c) revision 0x0000, manufacturer 0x05f1
[00:00:00.264,099] <inf> bt_hci_core: LMP: version 5.3 (0x0c) subver 0xffff
[00:00:08.311,828] <wrn> bt_l2cap: No credits to transmit packet
[00:00:08.311,920] <inf> ipsp: Run IPSP sample
[00:00:08.312,042] <inf> ipsp: -----------Device Running on IPv6 addr: fd00:a::1 port: 4242
[00:00:09.313,110] <wrn> ipsp: Retry ICMP Echo Request

[00:00:09.313,507] <wrn> bt_l2cap: No credits to transmit packet
[00:00:10.313,568] <wrn> ipsp: Retry ICMP Echo Request

[00:00:10.313,964] <wrn> bt_l2cap: No credits to transmit packet
[00:00:10.413,604] <wrn> bt_l2cap: No credits to transmit packet
[00:00:11.314,025] <wrn> ipsp: Retry ICMP Echo Request

[00:00:11.511,108] <inf> ipsp: ICMP IPV6 recv from 2a02:8388:7c7:5400:4194:609e:1042:442a to fd00:a::dcf2:f1ff:fee7:96e9
[00:00:11.511,138] <inf> ipsp: Received Echo Reply

[00:00:12.314,331] <wrn> ipsp: Retry ICMP Echo Request

[00:00:12.314,361] <inf> ipsp: Peer is active resuming normal Udp ipv6 operations

[00:00:35.669,860] <inf> ipsp: -------------Incoming UDP
[00:00:35.669,921] <inf> ipsp: -------------Build UDP reply
[00:00:35.669,952] <inf> ipsp: UDP IPv6 received 1222 bytes
[00:00:35.670,867] <wrn> bt_l2cap: No credits to transmit packet
[00:00:35.670,959] <inf> ipsp: -------------sent UDP ret: 1222 dest: fd00:b::2 port: 51083
[00:00:35.759,826] <wrn> bt_l2cap: No credits to transmit packet
[00:00:40.661,285] <wrn> bt_l2cap: No credits to transmit packet
[00:00:41.711,303] <wrn> bt_l2cap: No credits to transmit packet
[00:00:42.761,322] <wrn> bt_l2cap: No credits to transmit packet
[00:03:53.361,145] <wrn> bt_l2cap: No credits to transmit packet
[00:03:54.410,491] <wrn> bt_l2cap: No credits to transmit packet
[00:03:55.460,449] <wrn> bt_l2cap: No credits to transmit packet
[00:05:06.060,363] <wrn> bt_l2cap: No credits to transmit packet
[00:05:07.060,333] <wrn> bt_l2cap: No credits to transmit packet
[00:05:08.110,351] <wrn> bt_l2cap: No credits to transmit packet
[00:05:43.660,247] <err> os: ***** USAGE FAULT *****
[00:05:43.660,278] <err> os:   Unaligned memory access
[00:05:43.660,278] <err> os: r0/a1:  0x00005199  r1/a2:  0x00000000  r2/a3:  0xffff8726
[00:05:43.660,308] <err> os: r3/a4:  0x00000020 r12/ip:  0x2002817c r14/lr:  0x00030eb5
[00:05:43.660,308] <err> os:  xpsr:  0x81000000
[00:05:43.660,339] <err> os: Faulting instruction address (r15/pc): 0x00030e9e
[00:05:43.660,369] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
[00:05:43.660,400] <err> os: Current thread: 0x20020288 (unknown)
[00:05:43.759,643] <err> os: Halting system

after that the device hang there no other operation is possible and it crashes. to prevent the application from crash i increased the work_queue size:CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=4096 and in the file l2cap.c the function l2cap_chan_sdu_sent remove the void *cb_user_data = data->user_data; and passdata->user_data directing to the cbfunction.

prj.conf

CONFIG_MAIN_STACK_SIZE=4096
CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=4096
CONFIG_BT_BUF_ACL_RX_SIZE=255
CONFIG_BT_BUF_ACL_TX_SIZE=251
CONFIG_BT_BUF_CMD_TX_SIZE=255

CONFIG_BT_L2CAP_TX_MTU=247

CONFIG_BT=y
CONFIG_BT_DEBUG_LOG=y
CONFIG_BT_SMP=y
CONFIG_BT_PERIPHERAL=y
CONFIG_BT_CENTRAL=y
CONFIG_BT_L2CAP_DYNAMIC_CHANNEL=y
CONFIG_BT_DEVICE_NAME="Test IPSP node"
CONFIG_NETWORKING=y
CONFIG_NET_IPV6=y
CONFIG_NET_IPV4=n
CONFIG_NET_UDP=y
CONFIG_NET_TCP=y
CONFIG_TEST_RANDOM_GENERATOR=y
CONFIG_NET_L2_BT=y
CONFIG_INIT_STACKS=y
CONFIG_NET_PKT_RX_COUNT=10
CONFIG_NET_PKT_TX_COUNT=10
CONFIG_NET_BUF_RX_COUNT=20
CONFIG_NET_BUF_TX_COUNT=20
CONFIG_NET_IF_UNICAST_IPV6_ADDR_COUNT=3
CONFIG_NET_IF_MCAST_IPV6_ADDR_COUNT=4
CONFIG_NET_MAX_CONTEXTS=6

CONFIG_NET_CONFIG_AUTO_INIT=y
CONFIG_NET_CONFIG_SETTINGS=y
#CONFIG_NET_CONFIG_MY_IPV6_ADDR="2001:db8::1"
CONFIG_NET_CONFIG_MY_IPV6_ADDR="fd00:a::1"
CONFIG_NET_CONFIG_PEER_IPV6_ADDR="fd00:a::2"
CONFIG_NET_CONFIG_BT_NODE=y

Am I missing something here or there is a bug in the l2cap.c stack? @rlubos @jori-nordic @jmcruzal @nordicjm Any Tipps?

I think if the device can receive 1222 bytes ipv6 pkt it should be able to send 1222 bytes.

kruithofa commented 1 year ago

@jori-nordic please see if you can reproduce this

jori-nordic commented 1 year ago

image

can't reproduce it on https://github.com/jori-nordic/zephyr/commit/b72d11c613e15cc9424a7b4b8277f1fa8bae30e1

@rlubos it does seem like the interface never comes up though (the sample never prints its startup text).

The device attached to the linux PC is a bog standard LE adapter dongle from Laird (probably doesn't run zephyr).

rlubos commented 1 year ago

@jori-nordic You seem to be running a bit outdated Zephyr, missing fixes from https://github.com/zephyrproject-rtos/zephyr/pull/52934 - please try with the current main.

jori-nordic commented 1 year ago

@rlubos I used the same rev as @seekaddo in the hope that I'd get the same errors. I tried with the current zephyr, it's even worse, there's a nasty assert in conn.c, maybe a controller bug. edit: here's the same branch rebased on main image

jori-nordic commented 1 year ago

so I added more debug logging, and it seems it was the netbuf that was passed from the net stack that did not have enough room in its user_data. Adding CONFIG_NET_PKT_BUF_USER_DATA_SIZE=8 seems to get the connection going.

No way to get a ping through though, I'll try to follow-up on monday. image

jori-nordic commented 1 year ago

okay so, barring the misconfiguration I talked about in my previous comment, this seems to be a BlueZ/linux ipsp bug:

Here's the rev I used: https://github.com/jori-nordic/zephyr/commit/b01c54c3324e86b74ee91ed202267b584e8421ff Here's the commands I ran in succession to get there: Once:

  1. sudo btattach -B /dev/ttyACM14 -S 1000000 where ttyACM14 is the Laird BLE USB dongle
  2. echo 1 | sudo tee /sys/kernel/debug/bluetooth/6lowpan_enable

And then for every test cycle:

  1. echo "connect D7:99:F8:2D:DD:50 2" | sudo tee /sys/kernel/debug/bluetooth/6lowpan_control
  2. sudo ip address add 2001:db8::2/64 dev bt0
  3. ping6 2001:db8::1

ipsp.log ipsp.zip

edit: I should also mention my env:

jon@jori-pc:~/repos/zephyrproject/zephyr$ cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.1 LTS"
jon@jori-pc:~/repos/zephyrproject/zephyr$ uname -rv
5.15.0-58-generic #64-Ubuntu SMP Thu Jan 5 11:43:13 UTC 2023
jon@jori-pc:~/repos/zephyrproject/zephyr$ 
bamkrs commented 1 year ago

Just chiming in, having a similar problem. I used @jori-nordic rev. All of sudden everything works fine.

My host is an RPi3 as well and I'm using an nRF52840DK with the usb_hci sample as bluetooth dongle:

root@raspberrypi:~# uname -a
Linux raspberrypi 5.15.61-v8+ #1579 SMP PREEMPT Fri Aug 26 11:16:44 BST 2022 aarch64 GNU/Linux

This is my log:

*** Booting Zephyr OS build v3.3.0-rc3-53-gb01c54c3324e ***                                                                                                                                                                                                                                                    
[00:00:00.258,087] <inf> bt_hci_core: hci_vs_init: HW Platform: Nordic Semiconductor (0x0002)                                                                                                                                                                                                                  
[00:00:00.258,117] <inf> bt_hci_core: hci_vs_init: HW Variant: nRF52x (0x0002)                                                                                                                                                                                                                                 
[00:00:00.258,148] <inf> bt_hci_core: hci_vs_init: Firmware: Standard Bluetooth controller (0x00) Version 3.3 Build 0                                                                                                                                                                                          
[00:00:00.259,521] <inf> bt_hci_core: bt_dev_show_info: Identity: CA:CB:3D:0B:6C:F1 (random)                                                                                                                                                                                                                   
[00:00:00.259,552] <inf> bt_hci_core: bt_dev_show_info: HCI: version 5.3 (0x0c) revision 0x0000, manufacturer 0x05f1                                                                                                                                                                                           
[00:00:00.259,582] <inf> bt_hci_core: bt_dev_show_info: LMP: version 5.3 (0x0c) subver 0xffff                                                                                                                                                                                                                  
[00:00:00.260,314] <inf> net_config: net_config_init_by_iface: Initializing network                                                                                                                                                                                                                            
[00:00:00.260,314] <inf> net_config: check_interface: Waiting interface 1 (0x20001900) to be up...                                                                                                                                                                                                             
[00:00:10.209,075] <dbg> bt_l2cap: l2cap_accept: conn 0x20003fd8 handle 0                                                                                                                                                                                                                                      
[00:00:10.209,106] <dbg> bt_l2cap: bt_l2cap_chan_add: conn 0x20003fd8 chan 0x200040a0                                                                                                                                                                                                                          
[00:00:10.209,136] <dbg> bt_l2cap: l2cap_connected: ch 0x200040a0 cid 0x0005                                                                                                                                                                                                                                   
[00:00:10.209,167] <dbg> bt_l2cap: bt_l2cap_chan_add: conn 0x20003fd8 chan 0x20004348                                                                                                                                                                                                                          
[00:00:10.209,259] <dbg> bt_l2cap: bt_l2cap_chan_add: conn 0x20003fd8 chan 0x2000e8c8                                                                                                                                                                                                                          
[00:00:10.510,253] <dbg> bt_l2cap: bt_l2cap_recv: Packet for CID 5 len 14                                                                                                                                                                                                                                      
[00:00:10.510,284] <dbg> bt_l2cap: l2cap_chan_recv: chan 0x200040a0 len 14                                                                                                                                                                                                                                     
[00:00:10.510,314] <dbg> bt_l2cap: l2cap_recv: Signaling code 0x14 ident 1 len 10                                                                                                                                                                                                                              
[00:00:10.510,314] <dbg> bt_l2cap: le_conn_req: psm 0x23 scid 0x0040 mtu 1280 mps 23 credits 56                                                                                                                                                                                                                
[00:00:10.510,375] <dbg> bt_l2cap: l2cap_chan_accept: conn 0x20003fd8 scid 0x0040 chan 0x2000ae08                                                                                                                                                                                                              
[00:00:10.510,375] <dbg> bt_l2cap: bt_l2cap_chan_add: conn 0x20003fd8 chan 0x200002a0                                                                                                                                                                                                                          
[00:00:10.510,437] <dbg> bt_l2cap: l2cap_chan_tx_init: chan 0x200002a0                                                                                                                                                                                                                                         
[00:00:10.510,467] <err> bt_l2cap: l2cap_chan_tx_give_credits: chan 0x200002a0 got TX credits 56                                                                                                                                                                                                               
[00:00:10.510,498] <dbg> bt_l2cap: l2cap_chan_rx_init: chan 0x200002a0                                                                                                                                                                                                                                         
[00:00:10.510,498] <dbg> bt_l2cap: l2cap_chan_rx_give_credits: chan 0x200002a0 credits 6                                                                                                                                                                                                                       
[00:00:10.510,925] <err> net_bt: ipsp_connected: Channel 0x200002a0 Source CA:CB:3D:0B:6C:F1 (random) connected to Destination F6:4F:16:EC:77:1E (random)                                                                                                                                                      
[00:00:10.513,366] <wrn> net_bt: net_bt_send: iface 0x20001900 pkt 0x200168e0 len 92                                                                                                                                                                                                                           
[00:00:10.513,397] <dbg> bt_l2cap: bt_l2cap_chan_send_cb: chan 0x200002a0 buf 0x2001577c len 56                                                                                                                                                                                                                
[00:00:10.513,427] <dbg> bt_l2cap: l2cap_chan_create_seg: ch 0x200002a0 seg 0x2001579c len 23                                                                                                                                                                                                                  
[00:00:10.513,458] <dbg> bt_l2cap: l2cap_chan_le_send: ch 0x200002a0 cid 0x0040 len 23 credits 55                                                                                                                                                                                                              
[00:00:10.513,488] <dbg> bt_l2cap: bt_l2cap_send_cb: conn 0x20003fd8 cid 64 len 23                                                                                                                                                                                                                             
[00:00:10.513,549] <dbg> bt_l2cap: l2cap_chan_create_seg: ch 0x200002a0 seg 0x200157bc len 23                                                                                                                                                                                                                  
[00:00:10.513,549] <dbg> bt_l2cap: l2cap_chan_le_send: ch 0x200002a0 cid 0x0040 len 23 credits 54                                                                                                                                                                                                              
[00:00:10.513,580] <dbg> bt_l2cap: bt_l2cap_send_cb: conn 0x20003fd8 cid 64 len 23                                                                                                                                                                                                                             
[00:00:10.513,610] <dbg> bt_l2cap: l2cap_chan_le_send: ch 0x200002a0 cid 0x0040 len 12 credits 53                                                                                                                                                                                                              
[00:00:10.513,641] <dbg> bt_l2cap: bt_l2cap_send_cb: conn 0x20003fd8 cid 64 len 12                                                                                                                                                                                                                             
[00:00:10.513,671] <dbg> bt_l2cap: l2cap_chan_le_send_sdu: ch 0x200002a0 cid 0x0040 sent 56 total_len 56                                                                                                                                                                                                       
[00:00:10.513,946] <wrn> net_bt: net_bt_send: iface 0x20001900 pkt 0x200168e0 len 92                                                                                                                                                                                                                           
[00:00:10.513,946] <dbg> bt_l2cap: bt_l2cap_chan_send_cb: chan 0x200002a0 buf 0x200157dc len 56                                                                                                                                                                                                                
[00:00:10.514,007] <dbg> bt_l2cap: l2cap_chan_create_seg: ch 0x200002a0 seg 0x200157fc len 23                                                                                                                                                                                                                  
[00:00:10.514,038] <dbg> bt_l2cap: l2cap_chan_le_send: ch 0x200002a0 cid 0x0040 len 23 credits 52                                                                                                                                                                                                              
[00:00:10.514,038] <dbg> bt_l2cap: bt_l2cap_send_cb: conn 0x20003fd8 cid 64 len 23                                                                                                                                                                                                                             
[00:00:10.514,099] <dbg> bt_l2cap: l2cap_chan_create_seg: ch 0x200002a0 seg 0x2001581c len 23                                                                                                                                                                                                                  
[00:00:10.514,129] <dbg> bt_l2cap: l2cap_chan_le_send: ch 0x200002a0 cid 0x0040 len 23 credits 51                                                                                                                                                                                                              
[00:00:10.514,129] <dbg> bt_l2cap: bt_l2cap_send_cb: conn 0x20003fd8 cid 64 len 23                                                                                                                                                                                                                             
[00:00:10.514,190] <dbg> bt_l2cap: l2cap_chan_le_send: ch 0x200002a0 cid 0x0040 len 12 credits 50                                                                                                                                                                                                              
[00:00:10.514,190] <dbg> bt_l2cap: bt_l2cap_send_cb: conn 0x20003fd8 cid 64 len 12                                                                                                                                                                                                                             
[00:00:10.514,221] <dbg> bt_l2cap: l2cap_chan_le_send_sdu: ch 0x200002a0 cid 0x0040 sent 56 total_len 56                                                                                                                                                                                                       
[00:00:10.514,404] <wrn> net_bt: net_bt_send: iface 0x20001900 pkt 0x200168e0 len 64                                                                                                                                                                                                                           
[00:00:10.514,434] <dbg> bt_l2cap: bt_l2cap_chan_send_cb: chan 0x200002a0 buf 0x2001583c len 33                                                                                                                                                                                                                
[00:00:10.514,495] <dbg> bt_l2cap: l2cap_chan_create_seg: ch 0x200002a0 seg 0x2001585c len 23                                                                                                                                                                                                                  
[00:00:10.514,495] <dbg> bt_l2cap: l2cap_chan_le_send: ch 0x200002a0 cid 0x0040 len 23 credits 49                                                                                                                                                                                                              
[00:00:10.514,526] <dbg> bt_l2cap: bt_l2cap_send_cb: conn 0x20003fd8 cid 64 len 23                                                                                                                                                                                                                             
[00:00:10.514,556] <dbg> bt_l2cap: l2cap_chan_le_send: ch 0x200002a0 cid 0x0040 len 12 credits 48                                                                                                                                                                                                              
[00:00:10.514,587] <dbg> bt_l2cap: bt_l2cap_send_cb: conn 0x20003fd8 cid 64 len 12                                                                                                                                                                                                                             
[00:00:10.514,617] <dbg> bt_l2cap: l2cap_chan_le_send_sdu: ch 0x200002a0 cid 0x0040 sent 33 total_len 33                                                                                                                                                                                                       
[00:00:10.514,801] <wrn> net_bt: net_bt_send: iface 0x20001900 pkt 0x200168e0 len 48                                                                                                                                                                                                                           
[00:00:10.514,831] <dbg> bt_l2cap: bt_l2cap_chan_send_cb: chan 0x200002a0 buf 0x2001587c len 12                                                                                                                                                                                                                
[00:00:10.514,862] <dbg> bt_l2cap: l2cap_chan_le_send: ch 0x200002a0 cid 0x0040 len 14 credits 47                                                                                                                                                                                                              
[00:00:10.514,892] <dbg> bt_l2cap: bt_l2cap_send_cb: conn 0x20003fd8 cid 64 len 14                                                                                                                                                                                                                             
[00:00:10.514,923] <dbg> bt_l2cap: l2cap_chan_le_send_sdu: ch 0x200002a0 cid 0x0040 sent 12 total_len 12                                                                                                                                                                                                       
[00:00:10.514,953] <dbg> bt_l2cap: bt_l2cap_send_cb: conn 0x20003fd8 cid 5 len 14                                                                                                                                                                                                                              
[00:00:10.515,228] <inf> net_config: iface_up_handler: Interface 1 (0x20001900) coming up                                                                                                                                                                                                                      
[00:00:10.515,594] <wrn> net_bt: net_bt_send: iface 0x20001900 pkt 0x200168e0 len 92                                                                                                                                                                                                                           
[00:00:10.515,594] <dbg> bt_l2cap: bt_l2cap_chan_send_cb: chan 0x200002a0 buf 0x2001577c len 56                                                                                                                                                                                                                
[00:00:10.515,655] <dbg> bt_l2cap: l2cap_chan_create_seg: ch 0x200002a0 seg 0x200157bc len 23                                                                                                                                                                                                                  
[00:00:10.515,655] <dbg> bt_l2cap: l2cap_chan_le_send: ch 0x200002a0 cid 0x0040 len 23 credits 46                                                                                                                                                                                                              
[00:00:10.515,686] <dbg> bt_l2cap: bt_l2cap_send_cb: conn 0x20003fd8 cid 64 len 23                                                                                                                                                                                                                             
[00:00:10.515,747] <dbg> bt_l2cap: l2cap_chan_create_seg: ch 0x200002a0 seg 0x2001579c len 23                                                                                                                                                                                                                  
[00:00:10.515,747] <dbg> bt_l2cap: l2cap_chan_le_send: ch 0x200002a0 cid 0x0040 len 23 credits 45                                                                                                                                                                                                              
[00:00:10.515,777] <dbg> bt_l2cap: bt_l2cap_send_cb: conn 0x20003fd8 cid 64 len 23                                                                                                                                                                                                                             
[00:00:10.515,808] <dbg> bt_l2cap: l2cap_chan_le_send: ch 0x200002a0 cid 0x0040 len 12 credits 44                                                                                                                                                                                                              
[00:00:10.515,838] <dbg> bt_l2cap: bt_l2cap_send_cb: conn 0x20003fd8 cid 64 len 12                                                                                                                                                                                                                             
[00:00:10.515,869] <dbg> bt_l2cap: l2cap_chan_le_send_sdu: ch 0x200002a0 cid 0x0040 sent 56 total_len 56                                                                                                                                                                                                       
[00:00:10.516,052] <wrn> net_bt: net_bt_send: iface 0x20001900 pkt 0x200168e0 len 64                                                                                                                                                                                                                           
[00:00:10.516,082] <dbg> bt_l2cap: bt_l2cap_chan_send_cb: chan 0x200002a0 buf 0x2001589c len 33                                                                                                                                                                                                                
[00:00:10.516,113] <dbg> bt_l2cap: l2cap_chan_create_seg: ch 0x200002a0 seg 0x200158bc len 23                                                                                                                                                                                                                  
[00:00:10.516,143] <dbg> bt_l2cap: l2cap_chan_le_send: ch 0x200002a0 cid 0x0040 len 23 credits 43                                                                                                                                                                                                              
[00:00:10.516,174] <dbg> bt_l2cap: bt_l2cap_send_cb: conn 0x20003fd8 cid 64 len 23                                                                                                                                                                                                                             
[00:00:10.516,204] <dbg> bt_l2cap: l2cap_chan_le_send: ch 0x200002a0 cid 0x0040 len 12 credits 42                                                                                                                                                                                                              
[00:00:10.516,204] <dbg> bt_l2cap: bt_l2cap_send_cb: conn 0x20003fd8 cid 64 len 12                                                                                                                                                                                                                             
[00:00:10.516,235] <dbg> bt_l2cap: l2cap_chan_le_send_sdu: ch 0x200002a0 cid 0x0040 sent 33 total_len 33                                                                                                                                                                                                       
[00:00:10.516,357] <inf> ipsp: init_app: Run IPSP sample                                                                                                                                                                                                                                                       
[00:00:10.516,967] <inf> ipsp: listen: Starting to wait

Ping works fine as well.

seekaddo commented 1 year ago

@bamkrs Are you also able to send a UDP packet of more than 200 bytes or so? 12222 should be ok too. You can test on your Raspi3 with the following python code.

import socket

def main():
    client_socket = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM)

    server_address =  "fd00:a::dcf2:f1ff:fee7:96e9" ## zephyr node ip address
    server_port = 5006 ## defined port

    message = "Hello,Server!" * 89 # recv 89-94 but cannot respond
    client_socket.sendto(message.encode(), (server_address, server_port))

    data, server = client_socket.recvfrom(4096)
    print("received message:", data.decode(), "from", server)

if __name__ == "__main__":
    main()
bamkrs commented 1 year ago

@seekaddo I had to tweak the address and the port but nope. I can see some packets arriving at the target but it crashes:

[00:02:52.549,285] <err> os: usage_fault: ***** USAGE FAULT *****                                                                                                                                                                                                                                                                                                                                                                       
[00:02:52.549,285] <err> os: usage_fault:   Illegal use of the EPSR                                                                                                                                                                                                                                                                                                                                                                     
[00:02:52.549,316] <err> os: esf_dump: r0/a1:  0x20003fd8  r1/a2:  0x00000000  r2/a3:  0x00000000                                                                                                                                                                                                                                                                                                                                       
[00:02:52.549,346] <err> os: esf_dump: r3/a4:  0x20001660 r12/ip:  0x00000000 r14/lr:  0x0000eab3                                                                                                                                                                                                                                                                                                                                       
[00:02:52.549,346] <err> os: esf_dump:  xpsr:  0x60000000                                                                                                                                                                                                                                                                                                                                                                               
[00:02:52.549,377] <err> os: esf_dump: Faulting instruction address (r15/pc): 0x00000000                                                                                                                                                                                                                                                                                                                                                
[00:02:52.549,407] <err> os: z_fatal_error: >>> ZEPHYR FATAL ERROR 35: Unknown error on CPU 0                                                                                                                                                                                                                                                                                                                                           
[00:02:52.549,438] <err> os: z_fatal_error: Current thread: 0x20004db0 (sysworkq)                                                                                                                                                                                                                                                                                                                                                       
[00:02:54.351,104] <err> os: k_sys_fatal_error_handler: Halting system

Here is the full trace after starting the python script. fulltrace.txt

seekaddo commented 1 year ago

@bamkrs thanks for verifying this. It means the problem still exist. It needs a proper fix. thanks for the test. I will also test when i get home.

@jori-nordic any new findings yet? using Zephyr os for research work so will be grateful if there is a way out of this. thanks

jori-nordic commented 1 year ago

The problem is my linux setup doesn't get far enough to reproduce this. So I'm kinda stuck here. If you can provide a VM or something I can use then I can proceed.

@bamkrs could you try running addr2line (it's in the zephyr SDK) on the LR address? 0x0000eab3.

You could also try to use the coredump functionality: https://developer.nordicsemi.com/nRF_Connect_SDK/doc/latest/zephyr/services/debugging/coredump.html. If you manage to get a coredump, I'll also need the elf (the whole build folder might be useful).

bamkrs commented 1 year ago

@jori-nordic I do not have an VM. I just use an RPi with the latest console image and an nRF52840DK as its dongle (hci_usb sample).

However, I can supply you the whole build folder. What is the preferred way? As Attachment?

I played around a little bit more. It looks like either the IPSP sample isn't able to send anything back, or the linux module bugs. After adding some debug output to the echo example I see the following:

[00:04:10.759,643] <dbg> udp_echo: udp6_receive_thread: Received 18 bytes                                                                                                                                                                                                                                                                                                                                                             
[00:04:10.759,796] <dbg> udp_echo: udp6_tx: Sending 18 bytes                                                                                                                                                                                                                                                                                                                                                                         
[00:04:10.760,131] <wrn> net_bt: net_bt_send: iface 0x20002c50 pkt 0x20019238 len 62                                                                                                                                                                                                                                                                                                                                                    
[00:04:10.760,162] <dbg> bt_l2cap: bt_l2cap_chan_send_cb: chan 0x20000ab8 buf 0x20018158 len 55                                                                                                                                                                                                                                                                                                                                         
[00:04:10.760,253] <dbg> bt_l2cap: l2cap_chan_create_seg: ch 0x20000ab8 seg 0x20018178 len 23                                                                                                                                                                                                                                                                                                                                           
[00:04:10.760,253] <dbg> bt_l2cap: l2cap_chan_le_send: ch 0x20000ab8 cid 0x0040 len 23 credits 46                                                                                                                                                                                                                                                                                                                                       
[00:04:10.760,284] <dbg> bt_l2cap: bt_l2cap_send_cb: conn 0x20005428 cid 64 len 23                                                                                                                                                                                                                                                                                                                                                      
[00:04:10.760,437] <dbg> bt_l2cap: l2cap_chan_create_seg: ch 0x20000ab8 seg 0x20018178 len 23                                                                                                                                                                                                                                                                                                                                           
[00:04:10.760,467] <dbg> bt_l2cap: l2cap_chan_le_send: ch 0x20000ab8 cid 0x0040 len 23 credits 45                                                                                                                                                                                                                                                                                                                                       
[00:04:10.760,467] <dbg> bt_l2cap: bt_l2cap_send_cb: conn 0x20005428 cid 64 len 23                                                                                                                                                                                                                                                                                                                                                      
[00:04:10.760,620] <dbg> bt_l2cap: l2cap_chan_le_send: ch 0x20000ab8 cid 0x0040 len 11 credits 44                                                                                                                                                                                                                                                                                                                                       
[00:04:10.760,650] <dbg> bt_l2cap: bt_l2cap_send_cb: conn 0x20005428 cid 64 len 11                                                                                                                                                                                                                                                                                                                                                      
[00:04:10.760,742] <dbg> bt_l2cap: l2cap_chan_le_send_sdu: ch 0x20000ab8 cid 0x0040 sent 55 total_len 55                                                                                                                                                                                                                                                                                                                                
[00:04:10.858,215] <dbg> bt_l2cap: l2cap_chan_seg_sent: conn 0x20005428 CID 0x0040 err 0                                                                                                                                                                                                                                                                                                                                                
[00:04:10.858,276] <wrn> bt_l2cap: l2cap_chan_tx_resume: can't resume (creds 44 buf (nil))                                                                                                                                                                                                                                                                                                                                              
[00:04:10.858,917] <dbg> bt_l2cap: l2cap_chan_seg_sent: conn 0x20005428 CID 0x0040 err 0                                                                                                                                                                                                                                                                                                                                                
[00:04:10.858,947] <wrn> bt_l2cap: l2cap_chan_tx_resume: can't resume (creds 44 buf (nil))                                                                                                                                                                                                                                                                                                                                              
[00:04:10.907,775] <dbg> bt_l2cap: l2cap_chan_sdu_sent: conn 0x20005428 CID 0x0040 err 0                                                                                                                                                                                                                                                                                                                                                
[00:04:10.907,806] <wrn> bt_l2cap: l2cap_chan_tx_resume: can't resume (creds 44 buf (nil))                                                                                                                                                                                                                                                                                                                                              
[00:04:10.907,806] <wrn> bt_l2cap: l2cap_chan_tx_resume: can't resume (creds 0 buf (nil))                                                                                                                                                                                                                                                                                                                                               
[00:04:10.907,836] <wrn> bt_l2cap: l2cap_chan_tx_resume: can't resume (creds 0 buf (nil))                                                                                                                                                                                                                                                                                                                                               
[00:04:10.907,867] <wrn> bt_l2cap: l2cap_chan_tx_resume: can't resume (creds 0 buf (nil))                                                                                                                                                                                                                                                                                                                                               
[00:04:10.907,867] <wrn> bt_l2cap: l2cap_chan_tx_resume: can't resume (creds 44 buf (nil))                                                                                                                                                                                                                                                                                                                                              
[00:04:10.907,897] <dbg> bt_l2cap: bt_l2cap_recv: Packet for CID 5 len 8                                                                                                                                                                                                                                                                                                                                                                
[00:04:10.907,928] <dbg> bt_l2cap: l2cap_chan_recv: chan 0x200054f0 len 8                                                                                                                                                                                                                                                                                                                                                               
[00:04:10.907,958] <dbg> bt_l2cap: l2cap_recv: Signaling code 0x16 ident 12 len 4                                                                                                                                                                                                                                                                                                                                                       
[00:04:10.907,989] <dbg> bt_l2cap: le_credits: cid 0x0040 credits 3                                                                                                                                                                                                                                                                                                                                                                     
[00:04:10.907,989] <err> bt_l2cap: l2cap_chan_tx_give_credits: chan 0x20000ab8 got TX credits 3                                                                                                                                                                                                                                                                                                                                         
[00:04:10.908,020] <wrn> bt_l2cap: le_credits: chan 0x20000ab8 total credits 47                                                                                                                                                                                                                                                                                                                                                         
[00:04:10.908,050] <wrn> bt_l2cap: l2cap_chan_tx_resume: can't resume (creds 47 buf (nil))

So the IP packet arrives in my app and sends the data back. I'm not fluent in the bt_l2cap debug output, but seeing a lot WRN and ERR seem not to be right.

In tshark on bt0 I can see the packet leaving my host:

6 21.984459256  2001:db8::2 → 2001:db8::3  UDP 82 50355 → 1337 Len=18

However, the response never arrives. Instead the dmesg of my host outputs one of the following lines:

[191639.933752] bluetooth hci2 bt0: received unknown nhc id which was not found.
[192370.479502] bluetooth hci2 bt0: received nhc id for RFC6282 Routing which is not implemented.
bamkrs commented 1 year ago

@jori-nordic Sorry, missed the add2line part. Unfortunately it seems to get stuck somewhere and doesn't do anything. However, after studying the zephyr.lst and zephyr.map the function causing the error is located in ubsys/bluetooth/host/conn.c

 .text.tx_notify
                0x000000000000ea58       0x60 zephyr/subsys/bluetooth/host/libsubsys__bluetooth__host.a(conn.c.obj)
0000ea58 <tx_notify>:
{
    ea58:   e92d 41f0   stmdb   sp!, {r4, r5, r6, r7, r8, lr}
    k_fifo_put(&free_tx, tx);
    ea5c:   4f15        ldr r7, [pc, #84]   ; (eab4 <tx_notify+0x5c>)
{
    ea5e:   4604        mov r4, r0
    tx->cb = NULL;
    ea60:   f04f 0800   mov.w   r8, #0
    __asm__ volatile(
    ea64:   f04f 0220   mov.w   r2, #32
    ea68:   f3ef 8311   mrs r3, BASEPRI
    ea6c:   f382 8812   msr BASEPRI_MAX, r2
    ea70:   f3bf 8f6f   isb sy
    return list->head;
    ea74:   6a21        ldr r1, [r4, #32]
        if (!sys_slist_is_empty(&conn->tx_complete)) {
    ea76:   b929        cbnz    r1, ea84 <tx_notify+0x2c>
    __asm__ volatile(
    ea78:   f383 8811   msr BASEPRI, r3
    ea7c:   f3bf 8f6f   isb sy
}
    ea80:   e8bd 81f0   ldmia.w sp!, {r4, r5, r6, r7, r8, pc}
Z_GENLIST_GET_NOT_EMPTY(slist, snode)
    ea84:   6a60        ldr r0, [r4, #36]   ; 0x24
    return node->next;
    ea86:   680a        ldr r2, [r1, #0]
    list->head = node;
    ea88:   6222        str r2, [r4, #32]
Z_GENLIST_GET_NOT_EMPTY(slist, snode)
    ea8a:   4281        cmp r1, r0
    list->tail = node;
    ea8c:   bf08        it  eq
    ea8e:   6262        streq   r2, [r4, #36]   ; 0x24
    ea90:   f383 8811   msr BASEPRI, r3
    ea94:   f3bf 8f6f   isb sy
        user_data = tx->user_data;
    ea98:   e9d1 5601   ldrd    r5, r6, [r1, #4]
    tx->pending_no_cb = 0U;
    ea9c:   f8c1 800c   str.w   r8, [r1, #12]
    tx->user_data = NULL;
    eaa0:   e9c1 8801   strd    r8, r8, [r1, #4]
    k_fifo_put(&free_tx, tx);
    eaa4:   4638        mov r0, r7
    eaa6:   f039 fc59   bl  4835c <k_queue_append>
        cb(conn, user_data, 0);
    eaaa:   2200        movs    r2, #0
    eaac:   4631        mov r1, r6
    eaae:   4620        mov r0, r4
    eab0:   47a8        blx r5
    while (1) {
    eab2:   e7d7        b.n ea64 <tx_notify+0xc>
    eab4:   20001658    .word   0x20001658
jori-nordic commented 1 year ago

@bamkrs don't worry about the errs/warns, it's just a trick I use to visualize the logs I'm interested in (shows up as yellow and red on my term). The ones you're seeing have been upgraded to err/wrn by me, see e.g. this one.

Yeah the error is usually something dereferencing a ptr and calling it as if it were a valid callback. But in that case, tx_notify's address is in the LR, so that means that we are actually executing the l2cap SDU callback, and there we have the dereference. At least that's what I think.

Can you run it again w/ this rev? I added some more logs around the l2cap callbacks. https://github.com/jori-nordic/zephyr/tree/fc9d16b118a7f3aaf157f0bfdc1ec15b8ef3db44/./

You can just attach the whole build folder, I think that's fine. edit: if it works, you could also try to build with CONFIG_LOG_MODE_IMMEDIATE=y just in case some logs haven't had the time to be printed at the time of the busfault.

bamkrs commented 1 year ago

@jori-nordic Here you go. Sadly, immediate logging isn't available when bluetooth is enabled.

[00:02:12.681,579] <err> os: usage_fault: ***** USAGE FAULT *****                                                                                                     
[00:02:12.681,579] <err> os: usage_fault:   Unaligned memory access                                                                                                   
[00:02:12.681,610] <err> os: esf_dump: r0/a1:  0x20003fd8  r1/a2:  0x20005302  r2/a3:  0x80000000                                                                     
[00:02:12.681,640] <err> os: esf_dump: r3/a4:  0x00000124 r12/ip:  0x00000000 r14/lr:  0x0000ed39                                                                     
[00:02:12.681,640] <err> os: esf_dump:  xpsr:  0x81000000                                                                                                             
[00:02:12.681,671] <err> os: esf_dump: Faulting instruction address (r15/pc): 0x00010dda                                                                              
[00:02:12.681,701] <err> os: z_fatal_error: >>> ZEPHYR FATAL ERROR 31: Unknown error on CPU 0                                                                         
[00:02:12.681,762] <err> os: z_fatal_error: Current thread: 0x20004db0 (sysworkq)                                                                                     
[00:02:14.817,291] <err> os: k_sys_fatal_error_handler: Halting system

fc9d16b.txt

0x10dda corresponds to l2cap_chan_sdu_sent:

00010dc8 <l2cap_chan_sdu_sent>:
{
   10dc8:   e92d 4ff0   stmdb   sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
    LOG_DBG("conn %p CID 0x%04x err %d userdata %p", conn, cid, err, user_data);
   10dcc:   4c78        ldr r4, [pc, #480]  ; (10fb0 <l2cap_chan_sdu_sent+0x1e8>)
{
   10dce:   b083        sub sp, #12
    LOG_DBG("conn %p CID 0x%04x err %d userdata %p", conn, cid, err, user_data);
   10dd0:   6823        ldr r3, [r4, #0]
    uint16_t cid = data->cid;
   10dd2:   f8b1 b004   ldrh.w  fp, [r1, #4]
{
   10dd6:   4691        mov r9, r2
    LOG_DBG("conn %p CID 0x%04x err %d userdata %p", conn, cid, err, user_data);
   10dd8:   075a        lsls    r2, r3, #29
    void *cb_user_data = data->user_data;
   10dda:   e9d1 6a02   ldrd    r6, sl, [r1, #8]
{
   10dde:   af00        add r7, sp, #0
   10de0:   4680        mov r8, r0
   10de2:   460d        mov r5, r1
    LOG_DBG("conn %p CID 0x%04x err %d userdata %p", conn, cid, err, user_data);
   10de4:   d528        bpl.n   10e38 <l2cap_chan_sdu_sent+0x70>
   10de6:   f8c7 d004   str.w   sp, [r7, #4]
   10dea:   b08c        sub sp, #48 ; 0x30
   10dec:   466a        mov r2, sp
   10dee:   4b71        ldr r3, [pc, #452]  ; (10fb4 <l2cap_chan_sdu_sent+0x1ec>)
   10df0:   6153        str r3, [r2, #20]
   10df2:   4b71        ldr r3, [pc, #452]  ; (10fb8 <l2cap_chan_sdu_sent+0x1f0>)
   10df4:   6291        str r1, [r2, #40]   ; 0x28
   10df6:   2107        movs    r1, #7
   10df8:   e9c2 3006   strd    r3, r0, [r2, #24]
   10dfc:   e9c2 b908   strd    fp, r9, [r2, #32]

While 0xed39 is in tx_notify again.

jori-nordic commented 1 year ago

alright, we have a bit more info. Can you run again with https://github.com/jori-nordic/zephyr/tree/be92c56253a0ad7f7dcca1783e63d8b943c4fdd0/./

I suspect the dereferencing of user_data in sdu_sent is to blame, because we don't get the log message that's supposed to happen after the local var init.

edit: running with CONFIG_DEBUG=y and/or CONFIG_NULL_POINTER_EXCEPTION_DETECTION_DWT=y might yield more info.

bamkrs commented 1 year ago

You are absolutely correct:

[00:00:52.234,863] <dbg> bt_l2cap: l2cap_chan_seg_sent: conn 0x20003720 CID 0x0000 err 0 userdata 0x20004502                                                                              
[00:00:52.234,893] <dbg> bt_l2cap: l2cap_chan_seg_sent: Received segment sent callback for disconnected channel                                                                           
[00:00:52.234,954] <dbg> bt_l2cap: l2cap_chan_create_seg: ch 0x20000178 seg 0x200139a4 len 23                                                                                             
[00:00:52.234,985] <dbg> bt_l2cap: l2cap_chan_le_send: ch 0x20000178 cid 0x0040 len 23 credits 16                                                                                         
[00:00:52.235,015] <dbg> bt_l2cap: bt_l2cap_send_cb: conn 0x20003720 cid 64 len 23                                                                                                        
[00:00:52.235,534] <wrn> bt_conn: tx_notify: tx 0x20004480 cb 0xc91d user_data 0x20004502                                                                                                 
[00:00:52.235,565] <dbg> bt_l2cap: l2cap_chan_seg_sent: conn 0x20003720 CID 0x0000 err 0 userdata 0x20004502                                                                              
[00:00:52.235,565] <dbg> bt_l2cap: l2cap_chan_seg_sent: Received segment sent callback for disconnected channel                                                                           
[00:00:52.235,656] <dbg> bt_l2cap: l2cap_chan_create_seg: ch 0x20000178 seg 0x20013aa4 len 23                                                                                             
[00:00:52.235,656] <dbg> bt_l2cap: l2cap_chan_le_send: ch 0x20000178 cid 0x0040 len 23 credits 15                                                                                         
[00:00:52.235,687] <dbg> bt_l2cap: bt_l2cap_send_cb: conn 0x20003720 cid 64 len 23                                                                                                        
[00:00:52.236,114] <wrn> bt_conn: tx_notify: tx 0x20004490 cb 0xc9c1 user_data 0x20004502                                                                                                 
[00:00:52.236,145] <dbg> bt_l2cap: l2cap_chan_sdu_sent: conn 0x20003720 user_data 0x20004502 err 0                                                                                        
[00:00:52.236,175] <err> os: usage_fault: ***** USAGE FAULT *****                                                                                                                         
[00:00:52.236,175] <err> os: usage_fault:   Unaligned memory access                                                                                                                       
[00:00:52.236,206] <err> os: esf_dump: r0/a1:  0x000000ad  r1/a2:  0x00000000  r2/a3:  0x00000000                                                                                         
[00:00:52.236,236] <err> os: esf_dump: r3/a4:  0x20001540 r12/ip:  0x00000000 r14/lr:  0x00005371                                                                                         
[00:00:52.236,236] <err> os: esf_dump:  xpsr:  0x21000000                                                                                                                                 
[00:00:52.236,267] <err> os: esf_dump: Faulting instruction address (r15/pc): 0x0000c9fa                                                                                                  
[00:00:52.236,297] <err> os: z_fatal_error: >>> ZEPHYR FATAL ERROR 31: Unknown error on CPU 0                                                                                             
[00:00:52.236,328] <err> os: z_fatal_error: Current thread: 0x20004300 (unknown)                                                                                                          
[00:00:53.888,427] <err> os: k_sys_fatal_error_handler: Halting system

be92c56.txt

0xc9fa is dereferencing user_data:

0000c9c0 <l2cap_chan_sdu_sent>:
{
    c9c0:   e92d 4ff0   stmdb   sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
    c9c4:   b083        sub sp, #12
    LOG_DBG("conn %p user_data %p err %d", conn, user_data, err);
    c9c6:   46e8        mov r8, sp
{
    c9c8:   af00        add r7, sp, #0
    LOG_DBG("conn %p user_data %p err %d", conn, user_data, err);
    c9ca:   b08a        sub sp, #40 ; 0x28
{
    c9cc:   4691        mov r9, r2
    c9ce:   4606        mov r6, r0
    c9d0:   460d        mov r5, r1
    LOG_DBG("conn %p user_data %p err %d", conn, user_data, err);
    c9d2:   466a        mov r2, sp
    c9d4:   4b6a        ldr r3, [pc, #424]  ; (cb80 <l2cap_chan_sdu_sent+0x1c0>)
    c9d6:   4c6b        ldr r4, [pc, #428]  ; (cb84 <l2cap_chan_sdu_sent+0x1c4>)
    c9d8:   6153        str r3, [r2, #20]
    c9da:   2306        movs    r3, #6
    c9dc:   e9c2 4006   strd    r4, r0, [r2, #24]
    c9e0:   e9c2 1908   strd    r1, r9, [r2, #32]
    c9e4:   4868        ldr r0, [pc, #416]  ; (cb88 <l2cap_chan_sdu_sent+0x1c8>)
    c9e6:   f842 3f10   str.w   r3, [r2, #16]!
    c9ea:   f44f 5144   mov.w   r1, #12544  ; 0x3100
    c9ee:   f02a fd28   bl  37442 <z_log_msg_static_create.constprop.0>
    c9f2:   46c5        mov sp, r8
    __NOP();
    c9f4:   bf00        nop
    uint16_t cid = data->cid;
    c9f6:   88aa        ldrh    r2, [r5, #4]
    c9f8:   607a        str r2, [r7, #4]
    void *cb_user_data = data->user_data;
    c9fa:   e9d5 ab02   ldrd    sl, fp, [r5, #8]
    __NOP();
    c9fe:   bf00        nop
    LOG_DBG("conn %p CID 0x%04x err %d userdata %p", conn, cid, err, user_data);

The thing is, even with this bug fixed, the IPSP Sample won't work anyway because of a bug in the NHC for UDP with global addresses: https://github.com/zephyrproject-rtos/zephyr/issues/50598

jori-nordic commented 1 year ago

There's definitely something weird happening here: the user_data given is 0x20004502, but it's clearly able to be dereferenced, because a few lines up, we get this printed: l2cap_chan_seg_sent: conn 0x20003720 CID 0x0000 err 0 userdata 0x20004502.

What seems to be the problem is the data->user_data deref. Here's the declaration, maybe someone can spot something I missed:

struct l2cap_tx_meta_data {
    int sent;
    uint16_t cid;
    bt_conn_tx_cb_t cb;
    void *user_data;
};

I would expect the compiler to not pack the struct, and then for the members to be aligned to words. Could you check with this? https://github.com/jori-nordic/zephyr/tree/6504983878355b98434b93b0fec0e23a71c4f886/./

bamkrs commented 1 year ago
[00:00:37.615,692] <err> os: usage_fault: ***** USAGE FAULT *****                                                                                                                         
[00:00:37.615,722] <err> os: usage_fault:   Unaligned memory access                                                                                                                       
[00:00:37.615,753] <err> os: esf_dump: r0/a1:  0x000000a3  r1/a2:  0x00000000  r2/a3:  0x000000a4                                                                                         
[00:00:37.615,753] <err> os: esf_dump: r3/a4:  0x20001a98 r12/ip:  0x00000000 r14/lr:  0x000058b5                                                                                         
[00:00:37.615,783] <err> os: esf_dump:  xpsr:  0x21000000                                                                                                                                 
[00:00:37.615,783] <err> os: esf_dump: Faulting instruction address (r15/pc): 0x00010e06                                                                                                  
[00:00:37.615,844] <err> os: z_fatal_error: >>> ZEPHYR FATAL ERROR 31: Unknown error on CPU 0                                                                                             
[00:00:37.615,875] <err> os: z_fatal_error: Current thread: 0x20004db0 (sysworkq)                                                                                                         
[00:00:39.770,629] <err> os: k_sys_fatal_error_handler: Halting system
00010dc8 <l2cap_chan_sdu_sent>:
{
   10dc8:   e92d 4ff0   stmdb   sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
    LOG_DBG("conn %p user_data %p err %d", conn, user_data, err);
   10dcc:   4c86        ldr r4, [pc, #536]  ; (10fe8 <l2cap_chan_sdu_sent+0x220>)
{
   10dce:   b083        sub sp, #12
    LOG_DBG("conn %p user_data %p err %d", conn, user_data, err);
   10dd0:   6823        ldr r3, [r4, #0]
{
   10dd2:   460d        mov r5, r1
    LOG_DBG("conn %p user_data %p err %d", conn, user_data, err);
   10dd4:   0759        lsls    r1, r3, #29
{
   10dd6:   af00        add r7, sp, #0
   10dd8:   4606        mov r6, r0
   10dda:   4690        mov r8, r2
    LOG_DBG("conn %p user_data %p err %d", conn, user_data, err);
   10ddc:   d512        bpl.n   10e04 <l2cap_chan_sdu_sent+0x3c>
   10dde:   46e9        mov r9, sp
   10de0:   b08a        sub sp, #40 ; 0x28
   10de2:   466a        mov r2, sp
   10de4:   4b81        ldr r3, [pc, #516]  ; (10fec <l2cap_chan_sdu_sent+0x224>)
   10de6:   6153        str r3, [r2, #20]
   10de8:   4b81        ldr r3, [pc, #516]  ; (10ff0 <l2cap_chan_sdu_sent+0x228>)
   10dea:   e9c2 3006   strd    r3, r0, [r2, #24]
   10dee:   2306        movs    r3, #6
   10df0:   e9c2 5808   strd    r5, r8, [r2, #32]
   10df4:   f44f 5144   mov.w   r1, #12544  ; 0x3100
   10df8:   f842 3f10   str.w   r3, [r2, #16]!
   10dfc:   4620        mov r0, r4
   10dfe:   f032 fd43   bl  43888 <z_log_msg_static_create.constprop.0>
   10e02:   46cd        mov sp, r9
    __NOP();
   10e04:   bf00        nop
    void *cb_user_data = data->user_data;
   10e06:   e9d5 9a02   ldrd    r9, sl, [r5, #8]

6504983.txt

jori-nordic commented 1 year ago

alright, one last and then I give up: https://github.com/jori-nordic/zephyr/tree/586703f6dbe6d1519344a3dbab368661bf8fc1ae/./

bamkrs commented 1 year ago

Now its crashing when accessing data->cb

[00:01:00.373,443] <err> os: usage_fault: ***** USAGE FAULT *****                                                                                                                         
[00:01:00.373,443] <err> os: usage_fault:   Unaligned memory access                                                                                                                       
[00:01:00.373,474] <err> os: esf_dump: r0/a1:  0x0000008e  r1/a2:  0x00000000  r2/a3:  0x0000008f                                                                                         
[00:01:00.373,504] <err> os: esf_dump: r3/a4:  0x20001a98 r12/ip:  0x00000000 r14/lr:  0x000058b5                                                                                         
[00:01:00.373,535] <err> os: esf_dump:  xpsr:  0x21000000                                                                                                                                 
[00:01:00.373,535] <err> os: esf_dump: Faulting instruction address (r15/pc): 0x00010e06                                                                                                  
[00:01:00.373,596] <err> os: z_fatal_error: >>> ZEPHYR FATAL ERROR 31: Unknown error on CPU 0                                                                                             
[00:01:00.373,626] <err> os: z_fatal_error: Current thread: 0x20004db0 (sysworkq)                                                                                                         
[00:01:02.295,745] <err> os: k_sys_fatal_error_handler: Halting system

586703f.txt

00010dc8 <l2cap_chan_sdu_sent>:
{
   10dc8:   e92d 4ff0   stmdb   sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
    LOG_DBG("conn %p user_data %p err %d", conn, user_data, err);
   10dcc:   4c88        ldr r4, [pc, #544]  ; (10ff0 <l2cap_chan_sdu_sent+0x228>)
{
   10dce:   b083        sub sp, #12
    LOG_DBG("conn %p user_data %p err %d", conn, user_data, err);
   10dd0:   6823        ldr r3, [r4, #0]
{
   10dd2:   460d        mov r5, r1
    LOG_DBG("conn %p user_data %p err %d", conn, user_data, err);
   10dd4:   0759        lsls    r1, r3, #29
{
   10dd6:   af00        add r7, sp, #0
   10dd8:   4606        mov r6, r0
   10dda:   4690        mov r8, r2
    LOG_DBG("conn %p user_data %p err %d", conn, user_data, err);
   10ddc:   d512        bpl.n   10e04 <l2cap_chan_sdu_sent+0x3c>
   10dde:   46e9        mov r9, sp
   10de0:   b08a        sub sp, #40 ; 0x28
   10de2:   466a        mov r2, sp
   10de4:   4b83        ldr r3, [pc, #524]  ; (10ff4 <l2cap_chan_sdu_sent+0x22c>)
   10de6:   6153        str r3, [r2, #20]
   10de8:   4b83        ldr r3, [pc, #524]  ; (10ff8 <l2cap_chan_sdu_sent+0x230>)
   10dea:   e9c2 3006   strd    r3, r0, [r2, #24]
   10dee:   2306        movs    r3, #6
   10df0:   e9c2 5808   strd    r5, r8, [r2, #32]
   10df4:   f44f 5144   mov.w   r1, #12544  ; 0x3100
   10df8:   f842 3f10   str.w   r3, [r2, #16]!
   10dfc:   4620        mov r0, r4
   10dfe:   f032 fd47   bl  43890 <z_log_msg_static_create.constprop.0>
   10e02:   46cd        mov sp, r9
    __NOP();
   10e04:   bf00        nop
    bt_conn_tx_cb_t cb = data->cb;
   10e06:   e9d5 b902   ldrd    fp, r9, [r5, #8]
jori-nordic commented 1 year ago

okay I think I know what's happening (a good night's sleep and all that): if you look at the CID, it's pretty random, sometimes 0x0000 sometimes 0x2000. And in the seg_sent cb, we do see this error printed: Received segment sent callback for disconnected channel. But we don't get any logs about a disconnected channel. The 0x2000 CID is also invalid. That means that something's up, I think we're reading garbage.

What I think is the user_data field of the net buf is somehow being overwritten somewhere, making the struct member read fault (because we're reading from somewhere else than l2cap_tx_meta_data_storage). I added an assert in https://github.com/jori-nordic/zephyr/tree/604fa4070e99aea98bb2c154f0a278cdeb1e67f0/./ that should hopefully fire when running your test (double-check that you have CONFIG_ASSERT=y).

bamkrs commented 1 year ago

I'm a big fan of your commit messages, made my day. And you are absolutely correct:

ASSERTION FAIL [user_data >= &l2cap_tx_meta_data_storage[0] && user_data <= &l2cap_tx_meta_data_storage[20]] @ WEST_TOPDIR/zephyr/subsys/bluetooth/host/l2cap.c:1890          
        user_data field has been corrupted                                                                                                                                    
[00:00:45.072,631] <err> os: esf_dump: r0/a1:  0x00000004  r1/a2:  0x00000762  r2/a3:  0x000000c4                                                                             
[00:00:45.072,662] <err> os: esf_dump: r3/a4:  0x20001a98 r12/ip:  0x2000e81c r14/lr:  0x00010e31                                                                             
[00:00:45.072,692] <err> os: esf_dump:  xpsr:  0x61000000                                                                                                                     
[00:00:45.072,723] <err> os: esf_dump: Faulting instruction address (r15/pc): 0x0003fda0                                                                                      
[00:00:45.072,753] <err> os: z_fatal_error: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0                                                                                   
[00:00:45.072,814] <err> os: z_fatal_error: Current thread: 0x20004db0 (sysworkq)                                                                                             
[00:00:47.362,731] <err> os: k_sys_fatal_error_handler: Halting system
jori-nordic commented 1 year ago

Alright, I think that gives me enough data to try and trace this, thanks for helping :) I'll probably come back to you a bit later when I have some theory on what breaks and how to debug it further.

bamkrs commented 1 year ago

No worries, I'm on vacation until march 6th. Is there anything I can do to help you, now?

I've done all my testing with an RPi3B and used an nRF52840DK with the usb_hci sample as BT adapter. At one point, I've slightly modified the firmware for this adapter to LOG_HEXDUMP_WRN the USB EVT and ACL data for https://github.com/zephyrproject-rtos/zephyr/issues/50598 but this did not impact any of our tests (patch below).

Linux raspberrypi 5.15.84-v8+ #1613 SMP PREEMPT Thu Jan 5 12:03:08 GMT 2023 aarch64 GNU/Linux
diff --git a/samples/bluetooth/hci_usb/prj.conf b/samples/bluetooth/hci_usb/prj.conf
index 110cea092d..9a40ae7bd7 100644
--- a/samples/bluetooth/hci_usb/prj.conf
+++ b/samples/bluetooth/hci_usb/prj.conf
@@ -7,6 +7,10 @@ CONFIG_USB_DEVICE_INITIALIZE_AT_BOOT=n
 CONFIG_BT=y
 CONFIG_BT_HCI_RAW=y

+CONFIG_LOG_BUFFER_SIZE=65536
+CONFIG_LOG=y
+CONFIG_USB_DEVICE_LOG_LEVEL_DBG=y
+
 CONFIG_USB_DEVICE_STACK=y
 CONFIG_USB_DEVICE_PID=0x000B
 CONFIG_USB_DEVICE_BLUETOOTH=y
diff --git a/subsys/usb/device/class/bluetooth.c b/subsys/usb/device/class/bluetooth.c
index 9096151b14..8099e917c1 100644
--- a/subsys/usb/device/class/bluetooth.c
+++ b/subsys/usb/device/class/bluetooth.c
@@ -154,12 +154,14 @@ static void hci_tx_thread(void)

        switch (bt_buf_get_type(buf)) {
        case BT_BUF_EVT:
+           LOG_HEXDUMP_WRN(buf->data, buf->len, "EVT");
            usb_transfer_sync(
                bluetooth_ep_data[HCI_INT_EP_IDX].ep_addr,
                buf->data, buf->len,
                USB_TRANS_WRITE | USB_TRANS_NO_ZLP);
            break;
        case BT_BUF_ACL_IN:
+           LOG_HEXDUMP_WRN(buf->data, buf->len, "ACL");
            usb_transfer_sync(
                bluetooth_ep_data[HCI_IN_EP_IDX].ep_addr,
                buf->data, buf->len,
jori-nordic commented 1 year ago

I guess you could give me the detailed steps you use to reproduce it, ie what exact software and versions to install on the rpi, what code you're using, what commands are you running, etc.. and I can try to source a raspi (I have a zero, dunno if it's gonna be enough) to have some more involved debugging.

I don't suspect your modifications at this layer. It's probably either in the host itself (maybe conn.c's buffer/context mgmt logic) or in the network layers.

jori-nordic commented 1 year ago

I think I'm still missing a piece. Can you enable CONFIG_BT_CONN_LOG_LEVEL_DBG=y? This will help me to know if the problem lies in l2cap or below.

And then knowing what is at the address of the wrong user_data might be useful:

cfriedt commented 1 year ago

@vaishnavachath - just wanted to CC you here. There is currently a lack of maintainership in Zephyr + Linux for IPSP and talk of deprecation for BLE IPSP, but from what I recall, BB.O might want to ensure that it stays in. Maybe chat with @jadonk and see what they would like to do.

bamkrs commented 1 year ago

@jori-nordic So user_data points to some memory in conn_tx (conn_tx+14) image

I attached you the corefile and my build directory.

seekaddo commented 1 year ago

Is this issue discontinued? or the problems seems to be unfixable

jori-nordic commented 1 year ago

@seekaddo Sorry for the silence. Yeah IPSP is not the top priority at nordic right now, so I had other tasks I had to work on instead. Just as a shot in the dark, could you try and see if the issue's still there after applying https://github.com/zephyrproject-rtos/zephyr/pull/55519 ? There were a few errors in credit bookkeeping.

If I remember correctly, this issue is rather the buffer's metadata getting corrupted somewhere else, so that might not solve it Still worth trying though.

jori-nordic commented 1 year ago

A way I could see debugging it would be using the ARM DWT to add a watchpoint on the buffer's user_data pointer and in there break if the pointer points to outside the expected memory range.

But then again, I can't seem to reproduce this issue since it depends on the exact linux kernel + distro combo. So it's hard for me to help out.

seekaddo commented 1 year ago

I understand. I will try and get you a copy of a self-contained ubuntu20 that i was using for testing this.

jori-nordic commented 1 year ago

hey again @seekaddo . Don't know if you still have the setup handy, but could you give it one more shot? https://github.com/jori-nordic/zephyr/commits/l2cap-debugging

I added more checks and more logging, hopefully we find the issue :)

henryzhang62 commented 1 year ago

I can use telnet to reproduce similar error with default IPSP prj_dbg.conf. reproduce step: $ telnet 2001:db8::1 4242 Trying 2001:db8::1... Connected to 2001:db8::1. Escape character is '^]'. I am testing IPSP and want to see ble udp message. Union city is in the east bay. I am testing IPSP and want to see ble udp message. Union city is in the east bay.

We can following error from the console: Booting Zephyr OS build v3.4.0-rc1-198-g0ae7812f6bdb
[00:00:00.396,759] bt_hci_core: hci_vs_init: HW Platform: Nordic Semiconductor (0x0002) [00:00:00.396,789] bt_hci_core: hci_vs_init: HW Variant: nRF52x (0x0002) [00:00:00.396,820] bt_hci_core: hci_vs_init: Firmware: Standard Bluetooth controller (0x00) Version 30 [00:00:00.397,796] bt_hci_core: bt_dev_show_info: Identity: D6:BC:19:F7:53:94 (random) [00:00:00.397,857] bt_hci_core: bt_dev_show_info: HCI: version 5.4 (0x0d) revision 0x0000, manufactur1 [00:00:00.397,888] bt_hci_core: bt_dev_show_info: LMP: version 5.4 (0x0d) subver 0xffff [00:00:00.398,437] net_config: net_config_init_by_iface: Initializing network [00:00:00.398,468] net_config: check_interface: Waiting interface 1 (0x20001934) to be up... [00:00:09.371,368] net_config: iface_up_handler: Interface 1 (0x20001934) coming up [00:00:09.372,039] ipsp: init_app: Run IPSP sample [00:00:09.372,558] ipsp: listen: Starting to wait [00:00:09.472,167] net_config: ipv6_event_handler: IPv6 address: 2001:db8::1 [00:02:25.312,500] os: usage_fault: USAGE FAULT [00:02:25.312,500] os: usage_fault: Unaligned memory access
[00:02:25.312,530] os: esf_dump: r0/a1: 0x20002458 r1/a2: 0x20003802 r2/a3: 0x00000000 [00:02:25.312,530] os: esf_dump: r3/a4: 0x00000000 r12/ip: 0x00000000 r14/lr: 0x0000d08b [00:02:25.312,561] os: esf_dump: xpsr: 0x61000000
[00:02:25.312,561] os: esf_dump: Faulting instruction address (r15/pc): 0x0000e6ce [00:02:25.312,622] os: z_fatal_error: >>> ZEPHYR FATAL ERROR 31: Unknown error on CPU 0 [00:02:25.312,652] os: z_fatal_error: Current thread: 0x20002920 (BT RX) [00:02:25.442,993] os: k_sys_fatal_error_handler: Halting system

henryzhang62 commented 1 year ago

I find MPU error right after BT connection if CONFIG_BT_LOG_LEVEL_DBG=y in prj_dbg.conf:

uart:~$

--- 421 messages dropped --- [00:00:00.417,205] bt_hci_core: process_events: ev->state 0 [00:00:00.417,236] bt_conn: bt_conn_prepare_events: [00:00:00.417,236] bt_hci_core: hci_tx_thread: Calling k_poll with 2 events [00:00:00.417,297] bt_hci_core: bt_hci_cmd_send_sync: rsp 0x2000d20c opcode 0x2009 len 1 [00:00:00.417,388] bt_conn: bt_conn_set_state: disconnected -> connecting-adv --- 8 messages dropped --- [00:00:00.417,419] bt_conn: bt_conn_ref: handle 0 ref 1 -> 2 [00:00:00.417,449] bt_hci_core: bt_hci_cmd_create: opcode 0x200a param_len 1 [00:00:00.417,480] bt_hci_core: bt_hci_cmd_create: buf 0x2000d20c [00:00:00.417,480] bt_hci_core: bt_hci_cmd_send_sync: buf 0x2000d20c opcode 0x200a len 4 [00:00:00.417,541] bt_hci_core: process_events: count 2 [00:00:00.417,846] bt_hci_core: hci_cmd_complete: opcode 0x200a [00:00:00.417,846] bt_hci_core: hci_cmd_done: opcode 0x200a status 0x00 buf 0x2000d20c [00:00:00.417,907] bt_ctlr_hci_driver: hci_driver_send: exit: 0 [00:00:00.417,907] bt_hci_core: process_events: ev->state 0 [00:00:00.417,938] bt_conn: bt_conn_prepare_events: [00:00:00.417,968] bt_hci_core: hci_tx_thread: Calling k_poll with 2 events [00:00:00.417,999] bt_hci_core: bt_hci_cmd_send_sync: rsp 0x2000d20c opcode 0x200a len 1 [00:00:00.418,182] bt_conn: bt_conn_unref: handle 0 ref 2 -> 1 [00:00:00.418,212] net_config: net_config_init_by_iface: Initializing network [00:00:00.418,243] net_config: check_interface: Waiting interface 1 (0x20001934) to be up... [00:00:00.418,273] bt_hci_ecc: emulate_le_p256_public_key_cmd: [00:00:00.680,816] bt_hci_core: bt_recv: buf 0x2000cdf0 len 68 [00:00:00.680,847] bt_hci_core: hci_event: event 0x3e [00:00:00.680,847] bt_hci_core: hci_le_meta_event: subevent 0x08 [00:00:00.680,877] bt_ecc: bt_hci_evt_le_pkey_complete: status: 0x00 [00:00:00.680,877] bt_smp: bt_smp_pkey_ready: [00:00:00.681,182] bt_gatt: bt_gatt_attr_read: handle 0x0000 offset 0 length 2 [00:00:00.681,213] bt_gatt: bt_gatt_attr_read: handle 0x0000 offset 0 length 5 [00:00:00.681,457] bt_gatt: bt_gatt_attr_read: handle 0x0000 offset 0 length 5 [00:00:00.681,671] bt_gatt: bt_gatt_attr_read: handle 0x0000 offset 0 length 5 [00:00:00.681,732] bt_gatt: bt_gatt_attr_read: handle 0x0000 offset 0 length 2 [00:00:00.681,762] bt_gatt: bt_gatt_attr_read: handle 0x0000 offset 0 length 5 [00:00:00.682,006] bt_gatt: bt_gatt_attr_read: handle 0x0000 offset 0 length 5 [00:00:00.682,250] bt_gatt: bt_gatt_attr_read: handle 0x0000 offset 0 length 5 [00:00:00.682,281] bt_gatt: bt_gatt_attr_read: handle 0x0000 offset 0 length 2 [00:00:00.682,525] bt_gatt: db_hash_gen: Hash: 17 15 e4 4d a5 3e 1c f5 73 61 f3 aa 93 5b 05 6b |...M.>.. sa...[.k --- 88 messages dropped --- [00:00:11.097,015] bt_conn: conn_prepare_events: wait on host fifo [00:00:11.097,045] bt_hci_core: hci_tx_thread: Calling k_poll with 3 events [00:00:11.097,106] bt_hci_core: bt_hci_cmd_send_sync: rsp 0x2000d20c opcode 0x202f len 9 [00:00:11.097,106] bt_hci_core: bt_hci_cmd_create: opcode 0x2022 param_len 6 [00:00:11.097,137] bt_hci_core: bt_hci_cmd_create: buf 0x2000d20c [00:00:11.097,167] bt_hci_core: bt_hci_cmd_send_sync: buf 0x2000d20c opcode 0x2022 len 9 [00:00:11.097,229] bt_hci_core: process_events: count 3 [00:00:11.097,259] bt_hci_core: process_events: ev->state 4 [00:00:11.097,259] bt_hci_core: send_cmd: calling net_buf_get [00:00:11.097,290] bt_hci_core: send_cmd: calling sem_take_wait [00:00:11.097,320] bt_hci_core: send_cmd: Sending command 0x2022 (buf 0x2000d20c) to driver [00:00:11.097,351] bt_hci_core: bt_send: buf 0x2000d20c len 9 type 0 [00:00:11.097,351] bt_ctlr_hci_driver: hci_driver_send: enter [00:00:11.097,412] bt_ctlr_hci_driver: cmd_handle: Replying with event of 8 bytes --- 105 messages dropped --- [00:00:11.097,442] bt_hci_core: hci_cmd_complete: opcode 0x2022 [00:00:11.097,473] bt_hci_core: hci_cmd_done: opcode 0x2022 status 0x00 buf 0x2000d20c [00:00:11.097,503] bt_ctlr_hci_driver: hci_driver_send: exit: 0 [00:00:11.097,534] bt_hci_core: process_events: ev->state 0 00:11.097,534] b[00:00:11.097,564] bt_conn: bt_conn_prepare_events: [00:00:11.097,595] bt_conn: conn_prepare_events: Adding conn 0x20002458 to poll list [00:00:11.097,625] bt_conn: conn_prepare_events: wait on host fifo [00:00:11.097,625] bt_hci_core: hci_tx_thread: Calling k_poll with 3 events [00:00:11.097,686] bt_hci_core: bt_hci_cmd_send_sync: rsp 0x2000d20c opcode 0x2022 len 3 [00:00:11.097,717] bt_conn: bt_conn_unref: handle 0 ref 2 -> 1 [00:00:11.097,747] bt_ctlr_hci_driver: recv_thread: blocking [00:00:11.097,778] bt_ctlr_hci_driver: recv_thread: unblocked [00:00:11.097,808] bt_ctlr_hci: le_chan_sel_algo: handle: 0x0000, CSA: 0. [00:00:11.097,839] bt_ctlr_hci_driver: recv_thread: blocking [00:00:11.244,567] bt_conn: bt_acl_recv: Successfully parsed 7 byte L2CAP packet [00:00:11.244,567] bt_l2cap: bt_l2cap_recv: Packet for CID 4 len 3 [00:00:11.244,598] bt_l2cap: l2cap_chan_recv: chan 0x2000af08 len 3 [00:00:11.245,269] bt_l2cap: l2cap_chan_create_seg: ch 0x200002c0 seg 0x2000ee38 len 58 [00:00:11.245,300] bt_l2cap: l2cap_chan_le_send: ch 0x200002c0 cid 0x0040 len 58 credits 0 [00:00:11.245,330] bt_l2cap: bt_l2cap_send_cb: conn 0x20002458 cid 64 len 58 [00:00:11.245,361] bt_conn: bt_conn_send_cb: conn handle 0 buf len 62 cb 0x107b1 user_data 0x20003a40 [00:00:11.245,391] bt_l2cap: l2cap_chan_le_send_sdu: ch 0x200002c0 cid 0x0040 sent 56 total_len 56 [00:00:11.245,574] bt_l2cap: bt_l2cap_chan_send_cb: chan 0x200002c0 buf 0x2000ee58 len 33 [00:00:11.245,666] bt_l2cap: l2cap_chan_tx_process: buf 0x2000ee58 sent 0 [00:00:11.245,697] bt_l2cap: l2cap_chan_le_send: No credits to transmit packet [00:00:11.245,788] ipsp: init_app: Run IPSP sample [00:00:11.246,459] ipsp: listen: Starting to wait [00:00:11.291,656] bt_ctlr_hci_driver: prio_recv_thread: sem taken [00:00:11.291,687] bt_ctlr_hci_driver: prio_recv_thread: Num Complete: 0x0000:1 [00:00:11.291,717] bt_hci_core: hci_num_completed_packets: num_handles 1 [00:00:11.244,628] bt_att: bt_att_recv: Received ATT chan 0x2000af00 code 0x02 len 2 [00:00:11.291,748] bt_hci_core: hci_num_completed_packets: handle 0 count 1 [00:00:11.291,778] os: mem_manage_fault: MPU FAULT [00:00:11.291,778] os: mem_manage_fault: Data Access Violation [00:00:11.291,809] os: mem_manage_fault: MMFAR Address: 0x20008d38 [00:00:11.291,839] os: esf_dump: r0/a1: 0x20001348 r1/a2: 0x00000006 r2/a3: 0x20008d74 [00:00:11.291,839] os: esf_dump: r3/a4: 0x20008d7c r12/ip: 0x000507f7 r14/lr: 0x0003f9a5 [00:00:11.291,870] os: esf_dump: xpsr: 0x41000000 [00:00:11.291,900] os: esf_dump: Faulting instruction address (r15/pc): 0x0003f72c [00:00:11.291,931] os: z_fatal_error: >>> ZEPHYR FATAL ERROR 19: Unknown error on CPU 0 [00:00:11.291,992] os: z_fatal_error: Current thread: 0x200029d8 (BT RX pri) [00:00:11.682,983] os: k_sys_fatal_error_handler: Halting system

jori-nordic commented 1 year ago

@henryzhang62 are those logs with a build using this branch? https://github.com/jori-nordic/zephyr/commits/l2cap-debugging

henryzhang62 commented 1 year ago

Jonathan, It was zephyrproject-rtos/zephyr. I just tried $ git clone https://github.com/jori-nordic/zephyr $ cd zephyr $ git checkout -b l2cap-debugging$cd ..

The host can connect to the nRF52840 without crash. ping6 fe80::d6bc:19ff:fef7:5394%bt0 PING fe80::d6bc:19ff:fef7:5394%bt0(fe80::d6bc:19ff:fef7:5394%bt0) 56 data bytes 64 bytes from fe80::d6bc:19ff:fef7:5394%bt0: icmp_seq=1 ttl=64 time=71.2 ms 64 bytes from fe80::d6bc:19ff:fef7:5394%bt0: icmp_seq=2 ttl=64 time=105 ms

But fail to ping $ sudo ip address add 2001:db8::2/64 dev bt0 $ ping6 2001:db8::1 PING 2001:db8::1(2001:db8::1) 56 data bytes From 2001:db8::2 icmp_seq=1 Destination unreachable: Address unreachable From 2001:db8::2 icmp_seq=2 Destination unreachable: Address unreachable

Any suggestion. ---henry

On Tuesday, June 20, 2023 at 12:21:46 AM PDT, Jonathan Rico ***@***.***> wrote:  

@henryzhang62 are those logs with a build using this branch? https://github.com/jori-nordic/zephyr/commits/l2cap-debugging

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

henryzhang62 commented 1 year ago

$ telnet fe80::d6bc:19ff:fef7:5394%bt0 4242 Trying fe80::d6bc:19ff:fef7:5394%bt0... Connected to fe80::d6bc:19ff:fef7:5394%bt0. Escape character is '^]'. I am testing IPSP and want to see ble udp message. Union city is in the east bay. I am testing IPSP and want to see ble udp message. Union city is in the east bay.

I am testing IPSP and want to see ble udp message. Union city is in the east bay. I am testing IPSP and want to see ble udp message. Union city is in the east bay.

What I see here telnet echoes "I am testing IPSP and want to see ble udp message. Union city is in the east bay.". but long string "I am testing IPSP and want to see ble udp message. Union city is in the east bay. I am testing IPSP and want to see ble udp message. Union city is in the east bay."does not echo back.

On Tuesday, June 20, 2023 at 09:20:52 PM PDT, ***@***.*** ***@***.***> wrote:  

Jonathan, It was zephyrproject-rtos/zephyr. I just tried $ git clone https://github.com/jori-nordic/zephyr $ cd zephyr $ git checkout -b l2cap-debugging$cd ..

The host can connect to the nRF52840 without crash. ping6 fe80::d6bc:19ff:fef7:5394%bt0 PING fe80::d6bc:19ff:fef7:5394%bt0(fe80::d6bc:19ff:fef7:5394%bt0) 56 data bytes 64 bytes from fe80::d6bc:19ff:fef7:5394%bt0: icmp_seq=1 ttl=64 time=71.2 ms 64 bytes from fe80::d6bc:19ff:fef7:5394%bt0: icmp_seq=2 ttl=64 time=105 ms

But fail to ping $ sudo ip address add 2001:db8::2/64 dev bt0 $ ping6 2001:db8::1 PING 2001:db8::1(2001:db8::1) 56 data bytes From 2001:db8::2 icmp_seq=1 Destination unreachable: Address unreachable From 2001:db8::2 icmp_seq=2 Destination unreachable: Address unreachable

Any suggestion. ---henry

On Tuesday, June 20, 2023 at 12:21:46 AM PDT, Jonathan Rico ***@***.***> wrote:  

@henryzhang62 are those logs with a build using this branch? https://github.com/jori-nordic/zephyr/commits/l2cap-debugging

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

jori-nordic commented 1 year ago

Any suggestion.

You can start with enabling L2CAP and IPSP debug logging:

CONFIG_NET_L2_BT_LOG_LEVEL_DBG=y
CONFIG_BT_L2CAP_LOG_LEVEL_DBG=y
henryzhang62 commented 1 year ago

Jonathan,

My setup can build zephyrproject-rtos/zephyr without any warning.But got lots of warning from https://github.com/jori-nordic/zephyr.git. See attached. Also, my SDK $~zephyr-sdk-0.16.1 your branch is in~/zephyrproject/zephyr ---henry

On Wednesday, June 21, 2023 at 12:12:23 AM PDT, Jonathan Rico ***@***.***> wrote:  

Any suggestion.

You can start with enabling L2CAP and IPSP debug logging: CONFIG_NET_L2_BT_LOG_LEVEL_DBG=y CONFIG_BT_L2CAP_LOG_LEVEL_DBG=y

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

jori-nordic commented 1 year ago

@henryzhang62 attachments seem to not work when replying through email. Could you use the github web UI to attach the logs?

henryzhang62 commented 1 year ago

Jonathan,

attached them here. waring.log build.log

jori-nordic commented 1 year ago

@henryzhang62 this looks weird yeah. I didn't have any such warnings. Could you try running a west update after checking out my branch, then deleting the build folder just in case and rebuilding?

henryzhang62 commented 1 year ago

Jonathan, I find that your branch is very old and cannot match with zephyr-sdk-0.16.1_linux-x86_64.tar.xz toolchain (the most recent version zephyr rtos is using). For example, sample/bluetooth/ipsp/src/main.c still uses

include <logging/log.h>

Instead #include <zephyr/logging/log.h>

Could you please tell me which version of SDK your branch is using? ---henry

On Sunday, June 25, 2023 at 11:48:34 PM PDT, Jonathan Rico ***@***.***> wrote:  

@henryzhang62 this looks weird yeah. I didn't have any such warnings. Could you try running a west update after checking out my branch, then deleting the build folder just in case and rebuilding?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

henryzhang62 commented 1 year ago

Jonathan, I follow https://docs.zephyrproject.org/latest/develop/getting_started/index.html but could not build your branch.

Please tell me how to set up the environment to build your branch. Thanks. ---henry

On Monday, June 26, 2023 at 11:00:35 PM PDT, ***@***.*** ***@***.***> wrote:  

Jonathan, I find that your branch is very old and cannot match with zephyr-sdk-0.16.1_linux-x86_64.tar.xz toolchain (the most recent version zephyr rtos is using). For example, sample/bluetooth/ipsp/src/main.c still uses

include <logging/log.h>

Instead #include <zephyr/logging/log.h>

Could you please tell me which version of SDK your branch is using? ---henry

On Sunday, June 25, 2023 at 11:48:34 PM PDT, Jonathan Rico ***@***.***> wrote:  

@henryzhang62 this looks weird yeah. I didn't have any such warnings. Could you try running a west update after checking out my branch, then deleting the build folder just in case and rebuilding?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

github-actions[bot] commented 1 year ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

jhedberg commented 1 year ago

Any update on this one?

seekaddo commented 12 months ago

@jori-nordic and @jhedberg I have created a debian linux virtualbox. You can flash and debug and do everything with the nrf52dk_nrf52832. where can i upload it so you guys can use it to work on this issue.

Sorry for the delay. I was very busy with some research work.

I hope now you guy can figure out what is going on.