zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.87k stars 6.62k forks source link

Often disconnect timeouts when running the BLE peripheral HR sample on Nitrogen96 #32233

Closed agherzan closed 3 years ago

agherzan commented 3 years ago

Describe the bug

Connection often fails when running samples/bluetooth/peripheral_hr on Nitrogen96 (nRF52) and trying to connect to it from another device.

To Reproduce

Steps to reproduce the behaviour:

  1. build the peripheral_hrfor Nitrogen96 (I used the Yocto build system for this but it was reproducible using the west build system too)
  2. run the app and connect to board from a Linux host using bluez5 (bluetoothctl)

Expected behaviour

The application should always connect logging:

Connected
[00:05:28.493,316] <inf> hrs: HRS notifications enab

Impact

Product instability.

Logs and console output

Failing while connecting with bluez5 (5.55) from a Linux machine:

Connected
Disconnected (reason 0x08)

1   0.000000    host    controller  HCI_CMD 12  Sent LE Set Extended Scan Parameters
2   0.109440    controller  host    HCI_EVT 7   Rcvd Command Complete (LE Set Extended Scan Parameters)
3   0.109504    host    controller  HCI_CMD 10  Sent LE Set Extended Scan Enable
4   0.111440    controller  host    HCI_EVT 7   Rcvd Command Complete (LE Set Extended Scan Enable)
5   0.115463    controller  host    HCI_EVT 40  Rcvd LE Meta (LE Extended Advertising Report)
6   0.115563    host    controller  HCI_CMD 10  Sent LE Set Extended Scan Enable
7   0.125439    controller  host    HCI_EVT 7   Rcvd Command Complete (LE Set Extended Scan Enable)
8   0.125533    host    controller  HCI_CMD 30  Sent LE Extended Create Connection
9   0.127413    controller  host    HCI_EVT 7   Rcvd Command Status (LE Extended Create Connection)
10  0.437462    controller  host    HCI_EVT 34  Rcvd LE Meta (LE Enhanced Connection Complete)
11  0.437667    host    controller  HCI_CMD 6   Sent LE Read Remote Features
12  0.438406    controller  host    HCI_EVT 7   Rcvd LE Meta (LE Channel Selection Algorithm)
13  0.439409    controller  host    HCI_EVT 7   Rcvd Command Status (LE Read Remote Features)
14  0.677410    controller  host    HCI_EVT 15  Rcvd LE Meta (LE Read Remote Features Complete)
15  0.678399    controller  host    HCI_EVT 7   Rcvd Disconnect Complete

Failing when connecting from an Android (10) device:

Connected
Disconnected (reason 0x13)

On IOS (14) it always connects and the expected behaviour is consistent:

Connected
[00:05:28.493,316] <inf> hrs: HRS notifications enabled

1   0.000000    host    controller  HCI_CMD 12  Sent LE Set Extended Scan Parameters
2   0.110266    controller  host    HCI_EVT 7   Rcvd Command Complete (LE Set Extended Scan Parameters)
3   0.110331    host    controller  HCI_CMD 10  Sent LE Set Extended Scan Enable
4   0.112257    controller  host    HCI_EVT 7   Rcvd Command Complete (LE Set Extended Scan Enable)
5   0.116285    controller  host    HCI_EVT 40  Rcvd LE Meta (LE Extended Advertising Report)
6   0.116384    host    controller  HCI_CMD 10  Sent LE Set Extended Scan Enable
7   0.117265    controller  host    HCI_EVT 7   Rcvd Command Complete (LE Set Extended Scan Enable)
8   0.117365    host    controller  HCI_CMD 30  Sent LE Extended Create Connection
9   0.119259    controller  host    HCI_EVT 7   Rcvd Command Status (LE Extended Create Connection)
10  0.225203    controller  host    HCI_EVT 34  Rcvd LE Meta (LE Enhanced Connection Complete)
11  0.225394    host    controller  HCI_CMD 6   Sent LE Read Remote Features
12  0.226192    controller  host    HCI_EVT 7   Rcvd LE Meta (LE Channel Selection Algorithm)
13  0.227185    controller  host    HCI_EVT 7   Rcvd Command Status (LE Read Remote Features)
14  0.311267    controller  host    HCI_EVT 15  Rcvd LE Meta (LE Read Remote Features Complete)
15  0.311762    localhost ()    e1:46:c9:b1:f9:db ()    ATT 12  Sent Exchange MTU Request, Client Rx MTU: 517
16  0.355261    controller  host    HCI_EVT 8   Rcvd Number of Completed Packets
17  0.400275    e1:46:c9:b1:f9:db ()    localhost ()    ATT 12  Rcvd Exchange MTU Response, Server Rx MTU: 65
18  0.400629    localhost ()    e1:46:c9:b1:f9:db ()    ATT 16  Sent Read By Type Request, Server Supported Features, Handles: 0x0001..0xffff
19  0.446261    controller  host    HCI_EVT 8   Rcvd Number of Completed Packets
20  0.490328    e1:46:c9:b1:f9:db ()    localhost ()    ATT 14  Rcvd Error Response - Attribute Not Found, Handle: 0x0001 (Unknown)
21  0.490669    localhost ()    e1:46:c9:b1:f9:db ()    ATT 16  Sent Read By Type Request, Database Hash, Handles: 0x0001..0xffff
22  0.535261    controller  host    HCI_EVT 8   Rcvd Number of Completed Packets
23  0.579853    e1:46:c9:b1:f9:db ()    localhost ()    ATT 29  Rcvd Read By Type Response, Attribute List Length: 1
24  0.580195    localhost ()    e1:46:c9:b1:f9:db ()    ATT 16  Sent Read By Type Request, Database Hash, Handles: 0x0009..0xffff
25  0.625262    controller  host    HCI_EVT 8   Rcvd Number of Completed Packets
26  0.669731    e1:46:c9:b1:f9:db ()    localhost ()    ATT 14  Rcvd Error Response - Attribute Not Found, Handle: 0x0009 (Unknown)
27  0.678342    localhost ()    e1:46:c9:b1:f9:db ()    ATT 13  Sent Write Request, Handle: 0x0006 (Unknown)
28  0.715240    controller  host    HCI_EVT 8   Rcvd Number of Completed Packets
29  0.759628    e1:46:c9:b1:f9:db ()    localhost ()    ATT 10  Rcvd Write Response, Handle: 0x0006 (Unknown)
30  0.759846    localhost ()    e1:46:c9:b1:f9:db ()    ATT 14  Sent Write Request, Handle: 0x0004 (Unknown)
31  0.805276    controller  host    HCI_EVT 8   Rcvd Number of Completed Packets
32  0.849709    e1:46:c9:b1:f9:db ()    localhost ()    ATT 10  Rcvd Write Response, Handle: 0x0004 (Unknown)
33  0.849989    localhost ()    e1:46:c9:b1:f9:db ()    ATT 12  Sent Read Request, Handle: 0x000b (Unknown)
34  0.895268    controller  host    HCI_EVT 8   Rcvd Number of Completed Packets
35  0.939905    e1:46:c9:b1:f9:db ()    localhost ()    HCI_ACL 32  Rcvd  [Reassembled in #36]
36  0.940340    e1:46:c9:b1:f9:db ()    localhost ()    ATT 6   Rcvd Read Response, Handle: 0x000b (Unknown)
37  0.940529    localhost ()    e1:46:c9:b1:f9:db ()    ATT 12  Sent Read Request, Handle: 0x000d (Unknown)
38  0.985265    controller  host    HCI_EVT 8   Rcvd Number of Completed Packets
39  1.029705    e1:46:c9:b1:f9:db ()    localhost ()    ATT 12  Rcvd Read Response, Handle: 0x000d (Unknown)
40  1.029984    localhost ()    e1:46:c9:b1:f9:db ()    ATT 12  Sent Read Request, Handle: 0x0012 (Unknown)
41  1.075261    controller  host    HCI_EVT 8   Rcvd Number of Completed Packets
42  1.119705    e1:46:c9:b1:f9:db ()    localhost ()    ATT 11  Rcvd Read Response, Handle: 0x0012 (Unknown)
43  1.120055    localhost ()    e1:46:c9:b1:f9:db ()    ATT 14  Sent Write Request, Handle: 0x0013 (Unknown)
44  1.165262    controller  host    HCI_EVT 8   Rcvd Number of Completed Packets
45  1.209710    e1:46:c9:b1:f9:db ()    localhost ()    ATT 10  Rcvd Write Response, Handle: 0x0013 (Unknown)
46  1.524714    e1:46:c9:b1:f9:db ()    localhost ()    ATT 13  Rcvd Handle Value Notification, Handle: 0x0012 (Unknown)
47  2.559727    e1:46:c9:b1:f9:db ()    localhost ()    ATT 13  Rcvd Handle Value Notification, Handle: 0x0012 (Unknown)
48  3.549667    e1:46:c9:b1:f9:db ()    localhost ()    ATT 13  Rcvd Handle Value Notification, Handle: 0x0012 (Unknown)
49  4.539641    e1:46:c9:b1:f9:db ()    localhost ()    ATT 13  Rcvd Handle Value Notification, Handle: 0x0012 (Unknown)
50  5.529687    e1:46:c9:b1:f9:db ()    localhost ()    ATT 13  Rcvd Handle Value Notification, Handle: 0x0012 (Unknown)
51  6.564678    e1:46:c9:b1:f9:db ()    localhost ()    ATT 13  Rcvd Handle Value Notification, Handle: 0x0012 (Unknown)

I also tried bumping the supervisor timeout configuration to 20s on the Linux host:

1   0.000000    host    controller  HCI_CMD 12  Sent LE Set Extended Scan Parameters
2   0.108219    controller  host    HCI_EVT 7   Rcvd Command Complete (LE Set Extended Scan Parameters)
3   0.108274    host    controller  HCI_CMD 10  Sent LE Set Extended Scan Enable
4   0.110213    controller  host    HCI_EVT 7   Rcvd Command Complete (LE Set Extended Scan Enable)
5   0.140232    controller  host    HCI_EVT 40  Rcvd LE Meta (LE Extended Advertising Report)
6   0.140335    host    controller  HCI_CMD 10  Sent LE Set Extended Scan Enable
7   0.151210    controller  host    HCI_EVT 7   Rcvd Command Complete (LE Set Extended Scan Enable)
8   0.151306    host    controller  HCI_CMD 30  Sent LE Extended Create Connection
9   0.153210    controller  host    HCI_EVT 7   Rcvd Command Status (LE Extended Create Connection)
10  0.244231    controller  host    HCI_EVT 34  Rcvd LE Meta (LE Enhanced Connection Complete)
11  0.244435    host    controller  HCI_CMD 6   Sent LE Read Remote Features
12  0.245136    controller  host    HCI_EVT 7   Rcvd LE Meta (LE Channel Selection Algorithm)
13  0.246172    controller  host    HCI_EVT 7   Rcvd Command Status (LE Read Remote Features)
14  20.305255   controller  host    HCI_EVT 15  Rcvd LE Meta (LE Read Remote Features Complete)
15  20.306272   controller  host    HCI_EVT 7   Rcvd Disconnect Complete

Environment (please complete the following information):

Additional context

Capturing the transfer with Wireshark, I could see that it always fails on Sub Event: LE Read Remote Features Complete (0x04) with one of the following reasons:

Even if the supervisor time is bumped to 20s (on the Linux side), there are situations where it fails with a timeout after 20s.

15  20.306272   controller  host    HCI_EVT 7   Rcvd Disconnect Complete

Every time it succeeds connecting, it does in 300-400ms. See an example of a successful connection above (iOS).

cvinayak commented 3 years ago

Please use an nRF sniffer to capture the air traffic, and attach the logs for analysis.

https://infocenter.nordicsemi.com/index.jsp?topic=%2Fug_sniffer_ble%2FUG%2Fsniffer_ble%2Fintro.html&cp=9_4

agherzan commented 3 years ago

I've ordered one now but I think @idlethread might be able to provide some logs in the meanwhile. Thanks, @cvinayak.

cvinayak commented 3 years ago

@agherzan if you have a spare nRF51DK, nRF51 Dongle, nRF52DK, nRF52840DK, or nRF52840 Dongle, you can use them as nRF sniffer.

agherzan commented 3 years ago

Sadly I don't at this point but I'll get one shipped soon. I've ordered it yesterday. That being said, I think we have one in the team so we might be able to provide logs soon. We will update as soon as we have something.

idlethread commented 3 years ago

Setup is as follows:

                            nRF52832-DK
                              (sniffer)
                                |
Nitrogen (v1.1) ------------------------- Linux Laptop
(peripheral_hr)                          (bluetoothctl, 
                                              scan on,
                                             connect  <mac>)

Nitrogen zephyr log shows:

*** Booting Zephyr OS build v2.5.0-rc4  ***
Bluetooth initialized
Advertising successfully started
[00:00:00.394,897] <inf> bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
[00:00:00.394,927] <inf> bt_hci_core: HW Variant: nRF52x (0x0002)
[00:00:00.394,927] <inf> bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 2.5 Build 0
[00:00:00.395,538] <inf> bt_hci_core: Identity: C5:AE:65:1D:0B:45 (random)
[00:00:00.395,568] <inf> bt_hci_core: HCI: version 5.2 (0x0b) revision 0x0000, manufacturer 0x05f1
[00:00:00.395,568] <inf> bt_hci_core: LMP: version 5.2 (0x0b) subver 0xffff
Connected
Disconnected (reason 0x08)

btmon logs from the Linux Central are as follows:

@ MGMT Event: Device Connected (0x000b) plen 24                                      {0x0001} [hci0] 22.163996
        LE Address: C5:AE:65:1D:0B:45 (Static)
        Flags: 0x00000000
        Data length: 11
        Flags: 0x06
          LE General Discoverable Mode
          BR/EDR Not Supported
        16-bit Service UUIDs (complete): 3 entries
          Heart Rate (0x180d)
          Battery Service (0x180f)
          Device Information (0x180a)
< HCI Command: LE Read Remote Used Features (0x08|0x0016) plen 2                          #30 [hci0] 22.164216
        Handle: 3585
> HCI Event: Command Status (0x0f) plen 4                                                 #31 [hci0] 22.164812
      LE Read Remote Used Features (0x08|0x0016) ncmd 1
        Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 12                                                 #32 [hci0] 22.581889
      LE Read Remote Used Features (0x04)
        Status: Connection Timeout (0x08)
        Handle: 3585
        Features: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
> HCI Event: Disconnect Complete (0x05) plen 4                                            #33 [hci0] 22.582808
        Status: Success (0x00)
        Handle: 3585
        Reason: Connection Timeout (0x08)
@ MGMT Event: Device Disconnected (0x000c) plen 8                                    {0x0002} [hci0] 22.600792
        LE Address: C5:AE:65:1D:0B:45 (Static)
        Reason: Connection timeout (0x01)
@ MGMT Event: Device Disconnected (0x000c) plen 8                                    {0x0001} [hci0] 22.600792
        LE Address: C5:AE:65:1D:0B:45 (Static)
        Reason: Connection timeout (0x01)
@ MGMT Command: Start Discovery (0x0023) plen 1                                      {0x0001} [hci0] 27.575266
        Address type: 0x07
          BR/EDR
          LE Public
          LE Random
< HCI Command: LE Set Random Address (0x08|0x0005) plen 6                                 #34 [hci0] 27.575362
        Address: 31:C7:22:EE:87:EC (Non-Resolvable)
> HCI Event: Command Complete (0x0e) plen 4                                               #35 [hci0] 27.689754
      LE Set Random Address (0x08|0x0005) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Set Scan Parameters (0x08|0x000b) plen 7                                #36 [hci0] 27.689848
        Type: Active (0x01)
        Interval: 22.500 msec (0x0024)
        Window: 11.250 msec (0x0012)
        Own address type: Random (0x01)
        Filter policy: Accept all advertisement (0x00)

Another piece of information: All the laptops that had problems connecting to Nitrogen peripheral were running Intel chipsets.

nRF sniffer logs are here (just remove .zip extension): v1.1-board-zephyr-2.5-rc4-samsung-laptop.pcapng.zip

cvinayak commented 3 years ago

The supervision timeout happens at the central as it thinks it is not receiving responses from the peripheral. Clearly, the sniffer can see the responses. image

cvinayak commented 3 years ago

@agherzan and @idlethread any updates ? is this still a zephyr issue?

carlescufi commented 3 years ago

Closing this due to reporter inactivity. Feel free to reopen if this is still an issue.

idlethread commented 3 years ago

We tracked this down to intel bluetooth controllers (and firmware) on the host PC as being the common reason for failures. We haven't yet spent time trying to run the latest bluetooth stack on those hosts to see if it made the problem go away. That is still pending on us.

We saw no issues with a different controller.