zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.96k stars 6.67k forks source link

BLE HID sample often asserts on Windows 10 reconnection #15183

Closed Olivier-ProGlove closed 5 years ago

Olivier-ProGlove commented 5 years ago

Describe the bug

I was checking if https://github.com/zephyrproject-rtos/zephyr/pull/14938 could fix the issue "BLE HID sample fails to reconnect on Windows 10 tablets - Wrong Sequence Number (follow-up)" #14044.

I have the failing assert <err> bt_ctlr_hci: assert: '0' failed after the device connects. I am using Nordic nRF52840 dev kit (PCA10056) in a good third of the reconnection (it is the first time I tried, so maybe if depends of the environment).

Because there is a couple of LL_ASSERT(0) in subsys/bluetooth/controller/hci/hci.c, I added an error message before each of these asserts. I can see it is always the one from encode_data_ctrl() that fails: <err> bt_ctlr_hci: encode_data_ctrl: opcode:0x0

Here is my code change:

@@ -3246,6 +3251,7 @@ static void encode_data_ctrl(struct node_rx_pdu *node_rx,
                break;

        default:
+               BT_ERR("encode_data_ctrl: opcode:0x%x", pdu_data->llctrl.opcode);
                LL_ASSERT(0);
                return;
        }

To Reproduce

  1. Build and flash peripheral_hids on Nordic nRF52840 dev kit (PCA10056).
  2. Pair the Windows 10 tablet to the device (all good!)
  3. Reset the Nordic dev kit by pressing 'BOOT/RESET'

Expected behavior Windows 10 should automatically reconnect to the dev kit as it advertises on restart.

Screenshots or console output

***** Booting Zephyr OS v1.14.0-rc3-80-g417d349727e3 *****
Bluetooth initialized
Advertising successfully started
[00:00:00.007,720] <inf> bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
[00:00:00.007,720] <inf> bt_hci_core: HW Variant: nRF52x (0x0002)
[00:00:00.007,720] <inf> bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 1.14 Build 0
[00:00:00.008,087] <wrn> bt_hci_core: No ID address. App must call settings_load()
[00:00:00.010,803] <inf> bt_hci_core: Identity: ea:ac:67:83:44:59 (random)
[00:00:00.010,803] <inf> bt_hci_core: HCI: version 5.0 (0x09) revision 0x0000, manufacturer 0x05f1
[00:00:00.010,833] <inf> bt_hci_core: LMP: version 5.0 (0x09) subver 0xffff
Connected bc:83:85:0c:f3:ba (public)
[00:00:06.202,453] <err> bt_ctlr_hci: encode_data_ctrl: opcode:0x0
[00:00:06.202,453] <err> bt_ctlr_hci: assert: '0' failed
***** Kernel OOPS! *****
Current thread ID = 0x200008e8
Faulting instruction address = 0xeea2
Fatal fault in thread 0x200008e8! Aborting.

Environment:

Additional context

Adding PDU_DATA_LLCTRL_TYPE_CONN_UPDATE_IND to the switch seems to work:

@@ -3245,7 +3250,12 @@ static void encode_data_ctrl(struct node_rx_pdu *node_rx,
                le_unknown_rsp(pdu_data, handle, buf);
                break;

+       case PDU_DATA_LLCTRL_TYPE_CONN_UPDATE_IND:
+               BT_WARN("encode_data_ctrl: Skip CONN_UPDATE_IND");
+               break;
+

But I do not know if some actions need to be taken on PDU_DATA_LLCTRL_TYPE_CONN_UPDATE_IND.

cc: @joerchan @carlescufi @cvinayak

carlescufi commented 5 years ago

I believe that an upcoming patch from @joerchan will fix this as well.

Olivier-ProGlove commented 5 years ago

Here is the sniffer traces when the assert occurs (I removed my workaround) to force LL_ASSERT(0):windows10-reconnection-assert.zip

I duplicated the issue twice to ensure we have consistent traces (and it looks like the trace are consistant):

Screenshot from 2019-04-04 13-29-37