zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.44k stars 6.4k forks source link

STM32WB55 Bus Fault when connecting then disconnecting then connecting then disconnecting then connecting #50655

Closed KenthJohan closed 1 year ago

KenthJohan commented 1 year ago

STM32WB55 Bus Fault when connecting then disconnecting then connecting then disconnecting then connecting. I'm using nRF Connect for Mobile to connect to the STM32WB55. Referenses:

include <zephyr/bluetooth/bluetooth.h>

include <zephyr/bluetooth/hci.h>

include <zephyr/bluetooth/conn.h>

include <zephyr/bluetooth/uuid.h>

include <zephyr/bluetooth/gatt.h>

include <zephyr/bluetooth/services/bas.h>

BT_GATT_SERVICE_DEFINE(ess_svc, BT_GATT_PRIMARY_SERVICE(BT_UUID_ESS), );

static const struct bt_data ad[] = { BT_DATA_BYTES(BT_DATA_FLAGS, (BT_LE_AD_GENERAL | BT_LE_AD_NO_BREDR)), BT_DATA_BYTES(BT_DATA_GAP_APPEARANCE, 0x00, 0x03), };

static void connected(struct bt_conn *conn, uint8_t err) { if (err) { printk("Connection failed (err 0x%02x)\n", err); } else { printk("Connected\n"); } }

static void disconnected(struct bt_conn *conn, uint8_t reason) { printk("Disconnected (reason 0x%02x)\n", reason); }

BT_CONN_CB_DEFINE(conn_callbacks) = { .connected = connected, .disconnected = disconnected, };

static void bt_ready(void) { int err; printk("Bluetooth initialized\n"); err = bt_le_adv_start(BT_LE_ADV_CONN_NAME, ad, ARRAY_SIZE(ad), NULL, 0); if (err) { printk("Advertising failed to start (err %d)\n", err); return; } printk("Advertising successfully started\n"); }

void main(void) { int err; err = bt_enable(NULL); if (err) { printk("Bluetooth init failed (err %d)\n", err); return; } bt_ready(); while (1) { k_sleep(K_SECONDS(1)); } }

```ansi
*** Booting Zephyr OS build v3.2.0-rc3-24-g8364715998c9  ***
Bluetooth initialized
Advertising successfully started
[00:00:00.346,000] <inf> bt_hci_core: Identity: 02:80:E1:00:00:00 (public)
[00:00:00.346,000] <inf> bt_hci_core: HCI: version 5.0 (0x09) revision 0x0026, manufacturer 0x00
30
[00:00:00.346,000] <inf> bt_hci_core: LMP: version 5.0 (0x09) subver 0x2126
Connected
[00:00:07.251,000] <wrn> bt_hci_core: opcode 0x2016 status 0x3a
[00:00:07.251,000] <err> bt_hci_core: Failed read remote features (-5)
Disconnected (reason 0x13)
Connected                        
[00:00:15.788,000] <wrn> bt_hci_core: opcode 0x2016 status 0x3a
[00:00:15.788,000] <err> bt_hci_core: Failed read remote features (-5)
Disconnected (reason 0x13)                    
Connected
[00:00:19.577,000] <wrn> bt_hci_core: opcode 0x2016 status 0x3a
[00:00:19.577,000] <err> bt_hci_core: Failed read remote features (-5)
[00:00:20.166,000] <err> os: ***** BUS FAULT *****
[00:00:20.166,000] <err> os:   Imprecise data bus error
[00:00:20.166,000] <err> os: r0/a1:  0x200018e0  r1/a2:  0x200300d8  r2/a3:  0x2003d238
[00:00:20.166,000] <err> os: r3/a4:  0x00000000 r12/ip:  0x00000000 r14/lr:  0x08001ce7
[00:00:20.166,000] <err> os:  xpsr:  0x61000000
[00:00:20.166,000] <err> os: Faulting instruction address (r15/pc): 0x08001ce6
[00:00:20.166,000] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
[00:00:20.166,000] <err> os: Current thread: 0x200012b8 (unknown)
[00:00:20.237,000] <err> os: Halting system

After the last Connected using nRF Connect for Mobile it fails to connect.

Modifying the prj.conf like removing CONFIG_BT_BAS=y can stop zephyr log from printing the BUS FAULT but still after 3rd connection the nRF Connect for Mobile will refuse to connect to STM32WB55.

erwango commented 1 year ago

@KenthJohan, thanks for the details of the application. Though, I've had a try and I manage to reproduce an issue, but only with CONFIG_BT_BAS=n

What versions were you using ? If different, can you have a try with those ? I'll investigate why CONFIG_BT_BAS=n is causing this crash, but do you confirm this config is required in your case ? Not relevant

CONFIG_BT_BAS=n ```` *** Booting Zephyr OS build v3.2.0-rc3-16-ge348fe494a21 *** ASSERTION FAIL [err == 0] @ WEST_TOPDIR/zephyr/subsys/bluetooth/host/hci_core.c7 k_sem_take failed with err -11 [00:00:10.016,000] os: r0/a1: 0x00000003 r1/a2: 0x0000000a r2/a3: 0x0 [00:00:10.016,000] os: r3/a4: 0x080087d9 r12/ip: 0x00000000 r14/lr: 0x1 [00:00:10.016,000] os: xpsr: 0x41000000 [00:00:10.016,000] os: Faulting instruction address (r15/pc): 0x0800397c [00:00:10.016,000] os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0 [00:00:10.017,000] os: Current thread: 0x20001380 (unknown) [00:00:10.061,000] os: Halting system ````
CONFIG_BT_BAS=y ```` *** Booting Zephyr OS build v3.2.0-rc3-16-ge348fe494a21 *** Bluetooth initialized Advertising successfully started [00:00:00.153,000] bt_hci_core: Identity: 02:80:E1:00:00:00 (public) [00:00:00.153,000] bt_hci_core: HCI: version 5.2 (0x0b) revision 0xa06b, 0 [00:00:00.153,000] bt_hci_core: LMP: version 5.2 (0x0b) subver 0x216b Connected [00:00:12.544,000] bt_hci_core: opcode 0x2016 status 0x3a [00:00:12.544,000] bt_hci_core: Failed read remote features (-5) Disconnected (reason 0x13) Connected [00:00:18.105,000] bt_hci_core: opcode 0x2016 status 0x3a [00:00:18.105,000] bt_hci_core: Failed read remote features (-5) Disconnected (reason 0x13) Connected [00:00:23.636,000] bt_hci_core: opcode 0x2016 status 0x3a [00:00:23.636,000] bt_hci_core: Failed read remote features (-5) Disconnected (reason 0x13) Connected [00:00:28.027,000] bt_hci_core: opcode 0x2016 status 0x3a [00:00:28.027,000] bt_hci_core: Failed read remote features (-5) Disconnected (reason 0x13) Connected [00:00:33.279,000] bt_hci_core: opcode 0x2016 status 0x3a [00:00:33.279,000] bt_hci_core: Failed read remote features (-5) Disconnected (reason 0x13) Connected [00:00:38.379,000] bt_hci_core: opcode 0x2016 status 0x3a [00:00:38.379,000] bt_hci_core: Failed read remote features (-5) Disconnected (reason 0x13) Connected [00:00:42.887,000] bt_hci_core: opcode 0x2016 status 0x3a [00:00:42.887,000] bt_hci_core: Failed read remote features (-5) Disconnected (reason 0x13) Connected [00:00:47.560,000] bt_hci_core: opcode 0x2016 status 0x3a [00:00:47.560,000] bt_hci_core: Failed read remote features (-5) Disconnected (reason 0x13) Connected [00:00:55.543,000] bt_hci_core: opcode 0x2016 status 0x3a [00:00:55.543,000] bt_hci_core: Failed read remote features (-5) Disconnected (reason 0x13) Connected [00:00:58.948,000] bt_hci_core: opcode 0x2016 status 0x3a [00:00:58.948,000] bt_hci_core: Failed read remote features (-5) Disconnected (reason 0x13) Connected [00:01:01.460,000] bt_hci_core: opcode 0x2016 status 0x3a [00:01:01.460,000] bt_hci_core: Failed read remote features (-5) Disconnected (reason 0x13) Connected [00:01:03.984,000] bt_hci_core: opcode 0x2016 status 0x3a [00:01:03.984,000] bt_hci_core: Failed read remote features (-5) Disconnected (reason 0x13) Connected [00:01:06.450,000] bt_hci_core: opcode 0x2016 status 0x3a [00:01:06.451,000] bt_hci_core: Failed read remote features (-5) Disconnected (reason 0x13) Connected [00:01:08.668,000] bt_hci_core: opcode 0x2016 status 0x3a [00:01:08.668,000] bt_hci_core: Failed read remote features (-5) Disconnected (reason 0x13) Connected [00:01:12.227,000] bt_hci_core: opcode 0x2016 status 0x3a [00:01:12.227,000] bt_hci_core: Failed read remote features (-5) Disconnected (reason 0x13) Connected [00:01:14.926,000] bt_hci_core: opcode 0x2016 status 0x3a [00:01:14.926,000] bt_hci_core: Failed read remote features (-5) Disconnected (reason 0x13) Connected [00:01:17.713,000] bt_hci_core: opcode 0x2016 status 0x3a [00:01:17.713,000] bt_hci_core: Failed read remote features (-5) Disconnected (reason 0x13) ````
erwango commented 1 year ago

Ok, I'm now seeing the crash whatever the configuration, which is worrisome. Setting the prio to medium.

erwango commented 1 year ago

After re-flashing the BLE Controller f/w I'm not able to reproduce the crash anymore, whatever the configuration. I'm seeing as well that the crash I reported (ASSERTION FAIL, no response from the controller) is different from the issue you saw (BUS FAULT). I'll test again, but it seems like a tricky one. Also, I'd like to be sure we're on the same page regarding BLE Cltr f/w.

KenthJohan commented 1 year ago

@erwango Which tool to use for flashing BLE controller f/w: stm32wb5x_BLE_HCILayer_fw.bin (V1.14.1)? I tried ST-LINK writing it at 0x08000000 or is which address should the firmware be stored at? Not sure if firmware were successfully installed, I got checksum Programmed memory Checksum: 0x00981081

image

erwango commented 1 year ago

@KenthJohan, I advise to use STM32CubeProgrammer. For the address to be used, you should look at release note

erwango commented 1 year ago

@KenthJohan Did you manage to sort this out ? Is this problem still valid ?

FRASTM commented 1 year ago

On my side I can see the error. I noticed the BUS FAULT comes from the LST_remove_head function

 0x08002cfc in z_arm_usage_fault () at ./arch/arm/core/aarch32/cortex_m/fault_s.S:102
#6  <signal handler called>
#7  0x0800c2f2 in LST_remove_head (listHead=listHead@entry=0x20030030 <EvtQueue>, node=node@entry=0x20003c5c <z_interrupt_stacks+2076>)
    at ../modules/hal/stm32/lib/stm32wb/hci/stm_list.c:113
#8  0x08001b7e in HW_IPCC_BLE_RxEvtNot () at ../modules/hal/stm32/lib/stm32wb/hci/tl_mbox.c:153
#9  0x08001e7a in HW_IPCC_BLE_EvtHandler () at ../modules/hal/stm32/lib/stm32wb/hci/hw_ipcc.c:243
#10 HW_IPCC_Rx_Handler () at ../modules/hal/stm32/lib/stm32wb/hci/hw_ipcc.c:129
#11 0x080025f2 in _isr_wrapper () at ./arch/arm/core/aarch32/isr_wrapper.S:259
FRASTM commented 1 year ago

In the stm_list.c , the LST_remove_head() is disabling the irq and then calling the LST_remove_node ; this one is also disabling the irq. I tried to avoid this. With the following patch, I can connect/disconnect without the bus fault anymore:

diff --git a/lib/stm32wb/hci/stm_list.c b/lib/stm32wb/hci/stm_list.c
index 32779269..fd8589f6 100644
--- a/lib/stm32wb/hci/stm_list.c
+++ b/lib/stm32wb/hci/stm_list.c
@@ -101,15 +101,8 @@ void LST_remove_node (tListNode * node)

 void LST_remove_head (tListNode * listHead, tListNode ** node )
 {
-  uint32_t primask_bit;
-
-  primask_bit = __get_PRIMASK();  /**< backup PRIMASK bit */
-  __disable_irq();                  /**< Disable all interrupts by setting PRIMASK bit on Cortex*/
-
   *node = listHead->next;
   LST_remove_node (listHead->next);
-
-  __set_PRIMASK(primask_bit);     /**< Restore PRIMASK bit*/
 }
FRASTM commented 1 year ago

@KenthJohan could you please check if this PR is improving the connect/disconnect operations on your side

KenthJohan commented 1 year ago

@KenthJohan could you please check if this PR is improving the connect/disconnect operations on your side

Could you make a PR in https://github.com/zephyrproject-rtos/zephyr instead of https://github.com/zephyrproject-rtos/hal_stm32? I'm not sure how to use hal_stm32 repo when I'm using zephyr repo.

FRASTM commented 1 year ago

@KenthJohan Please check the west.yml as is https://github.com/zephyrproject-rtos/zephyr/pull/52755

KenthJohan commented 1 year ago

@FRASTM It does not fix the problem. It only changes the printing order.

*** Booting Zephyr OS build zephyr-v3.2.0-2243-g8fb0f3e3f1ff *** ``` remotes: - name: zephyrproject-rtos url-base: https://github.com/zephyrproject-rtos projects: - name: zephyr remote: zephyrproject-rtos revision: main import: true ``` ```js *** Booting Zephyr OS build zephyr-v3.2.0-2243-g8fb0f3e3f1ff *** Bluetooth initialized Advertising successfully started [00:00:00.346,000]  bt_hci_core: Identity: 02:80:E1:00:00:00 (public) [00:00:00.346,000]  bt_hci_core: HCI: version 5.0 (0x09) revision 0x0026, manufacturer 0x0030 [00:00:00.346,000]  bt_hci_core: LMP: version 5.0 (0x09) subver 0x2126 Connected [00:00:02.178,000]  bt_hci_core: opcode 0x2016 status 0x3a [00:00:02.178,000]  bt_hci_core: Failed read remote features (-5) Disconnected (reason 0x13) Connected [00:00:09.836,000]  bt_hci_core: opcode 0x2016 status 0x3a [00:00:09.836,000]  bt_hci_core: Failed read remote features (-5) Disconnected (reason 0x13) Connected [00:00:22.076,000]  bt_hci_core: opcode 0x2016 status 0x3a [00:00:22.076,000]  bt_hci_core: Failed read remote features (-5) Disconnected (reason 0x13) Connected [00:00:27.703,000]  bt_hci_core: opcode 0x2016 status 0x3a [00:00:27.703,000]  bt_hci_core: Failed read remote features (-5) [00:00:28.294,000]  os: ***** BUS FAULT ***** [00:00:28.294,000]  os: Imprecise data bus error [00:00:28.294,000]  os: r0/a1:0x200300d8r1/a2:0x2003d238r2/a3:0x2003d238 [00:00:28.294,000]  os: r3/a4:0x00000001 r12/ip:0x00000000 r14/lr:0x0800bd3d [00:00:28.294,000]  os:xpsr:0x61000000 [00:00:28.294,000]  os: Faulting instruction address (r15/pc): 0x0800bd40 [00:00:28.294,000]  os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0 [00:00:28.294,000]  os: Current thread: 0x200012f0 (unknown) [00:00:28.364,000]  os: Halting system ```
*** Booting Zephyr OS build zephyr-v3.2.0-2235-gb0e210dd659c *** ``` remotes: - name: zephyrproject-rtos url-base: https://github.com/zephyrproject-rtos projects: - name: zephyr remote: zephyrproject-rtos revision: pull/52755/head import: true ``` ```js *** Booting Zephyr OS build zephyr-v3.2.0-2235-gb0e210dd659c *** [00:00:00.341,000]  bt_hci_core: bt_dev_show_info: Identity: 02:80:E1:00:00:00 (public) [00:00:00.341,000]  bt_hci_core: bt_dev_show_info: HCI: version 5.0 (0x09) revision 0x0026, manufacturer 0x0030  [00:00:00.341,000]  bt_hci_core: bt_dev_show_info: LMP: version 5.0 (0x09) subver 0x2126 Bluetooth initialized Advertising successfully started Connected [00:00:17.382,000]  bt_hci_core: bt_hci_cmd_send_sync: opcode 0x2016 status 0x3a [00:00:17.382,000]  bt_hci_core: conn_auto_initiate: Failed read remote features (-5) Disconnected (reason 0x13) Connected [00:00:22.775,000]  bt_hci_core: bt_hci_cmd_send_sync: opcode 0x2016 status 0x3a [00:00:22.775,000]  bt_hci_core: conn_auto_initiate: Failed read remote features (-5) Disconnected (reason 0x13) Connected [00:00:26.014,000]  bt_hci_core: bt_hci_cmd_send_sync: opcode 0x2016 status 0x3a [00:00:26.014,000]  bt_hci_core: conn_auto_initiate: Failed read remote features (-5) [00:00:26.605,000]  os: bus_fault: ***** BUS FAULT ***** [00:00:26.605,000]  os: bus_fault: Imprecise data bus error [00:00:26.605,000]  os: esf_dump: r0/a1: 0x200300d8 r1/a2: 0x2003d238 r2/a3: 0x2003d238 [00:00:26.605,000]  os: esf_dump: r3/a4: 0x00000000 r12/ip: 0x00000000 r14/lr: 0x08001ef3 [00:00:26.605,000]  os: esf_dump: xpsr: 0x61000000 [00:00:26.605,000]  os: esf_dump: Faulting instruction address (r15/pc): 0x08001ef2 [00:00:26.605,000]  os: z_fatal_error: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0 [00:00:26.605,000]  os: z_fatal_error: Current thread: 0x200012f8 (unknown) [00:00:26.691,000]  os: k_sys_fatal_error_handler: Halting system ```
FRASTM commented 1 year ago

did you update your west.yml with $ west update to retrieve that new hal_stm32 branch ?

KenthJohan commented 1 year ago

did you update your west.yml with $ west update to retrieve that new hal_stm32 branch ?

Yes I changed revision to https://github.com/zephyrproject-rtos/hal_stm32/pull/154. After I did west update I saw the file stm_list.c change

I have tested main and https://github.com/zephyrproject-rtos/hal_stm32/pull/154 revision on a new computer and there is no crash. I can't reproduce the crash.

I will compare binaries from both my computers soon.

erwango commented 1 year ago

@KenthJohan Any update ?

erwango commented 1 year ago

Last update was positive, moving from "Medium" to "Low".

erwango commented 1 year ago

No update. Closing. Reopen if required

KenthJohan commented 7 months ago

@erwango I have tried this firmware now: stm32wb5x_BLE_HCILayer_fw.bin (V1.14.1) image

With the new firmware I get this error now when I connect from nrf connect app:

*** Booting Zephyr OS build v3.6.0-rc1-65-g7551b85f2287 ***
ASSERTION FAIL [err == 0] @ WEST_TOPDIR/zephyr/subsys/bluetooth/host/hci_core.c:331
    command opcode 0xfc0c timeout with err -11
[00:00:10.001,000] <err> os: r0/a1:  0x00000003  r1/a2:  0x00000000  r2/a3:  0x00000002
[00:00:10.001,000] <err> os: r3/a4:  0x20000690 r12/ip:  0x00002711 r14/lr:  0x0800458b
[00:00:10.001,000] <err> os:  xpsr:  0x41000000
[00:00:10.001,000] <err> os: Faulting instruction address (r15/pc): 0x08004596
[00:00:10.001,000] <err> os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
[00:00:10.002,000] <err> os: Current thread: 0x20001628 (unknown)
[00:00:10.052,000] <err> os: Halting system
erwango commented 7 months ago

@KenthJohan To ease issues tracking, would you mind opening a dedicated ticket ?

KenthJohan commented 7 months ago

@erwango New ticket here https://github.com/zephyrproject-rtos/zephyr/issues/68747