zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.86k stars 6.62k forks source link

Bluetooth: LL assert with chained adv packets #52887

Closed jori-nordic closed 7 months ago

jori-nordic commented 1 year ago

Describe the bug LL assert when controller uses aux adv chains. Happens after ~10s on real HW, not able to reproduce using BSIM. Note that the bsim test also has a bunch of problems:

To Reproduce Check out this commit https://github.com/jori-nordic/zephyr/tree/e4300c62c9181f8d540fa64fd3933955510f5b34/./ Build and run on a nRF52840 DK.

Logs and console output

*** Booting Zephyr OS build zephyr-v3.2.0-2350-ga8f8320248ac ***
Starting Multiple Broadcaster Demo
[00:00:00.366,790] <inf> bt_hci_core: hci_vs_init: HW Platform: Nordic Semiconductor (0x0002)
[00:00:00.366,821] <inf> bt_hci_core: hci_vs_init: HW Variant: nRF52x (0x0002)
[00:00:00.366,851] <inf> bt_hci_core: hci_vs_init: Firmware: Standard Bluetooth controller (0x00) Version 3.2 Build 99
[00:00:00.367,218] <wrn> bt_id: bt_read_static_addr: No static addresses stored in controller
[00:00:00.367,492] <inf> bt_hci_core: bt_dev_show_info: Identity: C3:44:C7:BA:1F:E7 (random)
[00:00:00.367,523] <inf> bt_hci_core: bt_dev_show_info: HCI: version 5.3 (0x0c) revision 0x0000, manufacturer 0x05f1
[00:00:00.367,553] <inf> bt_hci_core: bt_dev_show_info: LMP: version 5.3 (0x0c) subver 0xffff
adv data len 254 def 254

Started Extended Advertising Set 0.
Started Extended Advertising Set 1.
Exiting main thread.
ASSERTION FAIL [0] @ WEST_TOPDIR/zephyr/subsys/bluetooth/controller/ll_sw/nordic/lll/lll.c:473
        lll_preempt_calc: Actual EVENT_OVERHEAD_START_US = 305
[00:00:09.694,000] <err> os: esf_dump: r0/a1:  0x00000003  r1/a2:  0x00000000  r2/a3:  0x00000002
[00:00:09.694,030] <err> os: esf_dump: r3/a4:  0x20000720 r12/ip:  0x00000010 r14/lr:  0x00001293
[00:00:09.694,030] <err> os: esf_dump:  xpsr:  0x41000011
[00:00:09.694,061] <err> os: esf_dump: Faulting instruction address (r15/pc): 0x0000129e
[00:00:09.694,091] <err> os: z_fatal_error: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
[00:00:09.694,091] <err> os: z_fatal_error: Fault during interrupt handling

[00:00:09.694,152] <err> os: z_fatal_error: Current thread: 0x20000e20 (unknown)
[00:00:09.848,175] <err> os: k_sys_fatal_error_handler: Halting system
cvinayak commented 1 year ago

I am not able to reproduce the assertion on nRF52833DK (not sure if nRF5240DK would be different).

image

Multiple advertising sets with chaining will have overlapping auxiliary chains due to missing implementation of timing reservations for the chains. (advertising chains is experimental in Zephyr Controller).

Here is a workaround to have non-overlapping auxiliary chain PDUs for your updates to broadcaster_multiple: 0001-Bluetooth-Controller-WA-for-chain-PDU-time-reservati.patch.txt

I use the additional kconfig options below for the samples/bluetooth/observer sample to scan for the multiple extended advertising broadcasters as shown in the screenshot above:

CONFIG_BT_EXT_ADV=y

CONFIG_BT_BUF_EVT_RX_COUNT=10
CONFIG_BT_BUF_EVT_RX_SIZE=255
CONFIG_BT_EXT_SCAN_BUF_SIZE=1650

CONFIG_BT_CTLR_RX_BUFFERS=9
CONFIG_BT_CTLR_SCAN_DATA_LEN_MAX=1650
cvinayak commented 1 year ago

Advertising PDU chaining is not supported, hence lowering the bug priority.

kruithofa commented 1 year ago

@jori-nordic Can you verify if #59824 fixes this?

jori-nordic commented 1 year ago

@kruithofa not reproducible on today's master, closing.

jori-nordic commented 1 year ago

@kruithofa spoke too soon:

Exiting main thread.
ASSERTION FAIL [0] @ WEST_TOPDIR/zephyr/subsys/bluetooth/controller/ll_sw/nordic/lll/lll_adv_aux.c:326
        prepare_cb: Actual EVENT_OVERHEAD_START_US = 305
[00:42:10.265,045] <err> os: r0/a1:  0x00000003  r1/a2:  0x00000000  r2/a3:  0x00000002
[00:42:10.265,045] <err> os: r3/a4:  0x200006e8 r12/ip:  0x00000010 r14/lr:  0x00001503
[00:42:10.265,075] <err> os:  xpsr:  0x41000011
[00:42:10.265,075] <err> os: Faulting instruction address (r15/pc): 0x0000150e
[00:42:10.265,106] <err> os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
[00:42:10.265,136] <err> os: Fault during interrupt handling

[00:42:10.265,167] <err> os: Current thread: 0x20000df8 (unknown)
[00:42:10.671,997] <err> os: Halting system

so went from ~10s to assert to ~40mins

github-actions[bot] commented 11 months ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

Oleh-Kravchenko commented 9 months ago

zephyr-v3.5.0

ASSERTION FAIL [0] @ WEST_TOPDIR/zephyr/subsys/bluetooth/controller/ll_sw/nordic/lll/lll_adv.c:1044
    prepare_cb: Actual EVENT_OVERHEAD_START_US = 65521
E: r0/a1:  0x00000003  r1/a2:  0x0000000f  r2/a3:  0x00000020
E: r3/a4:  0x0003b879 r12/ip:  0x00000000 r14/lr:  0x00002263
E:  xpsr:  0x61000028
E: Faulting instruction address (r15/pc): 0x0000226e
E: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
E: Fault during interrupt handling
cvinayak commented 9 months ago

@Oleh-Kravchenko could you provide details on how to reproduce?

github-actions[bot] commented 7 months ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.