zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
9.83k stars 6.06k forks source link

BLE Connection fails to establish between two nRF52840-USB Dongles with Zephyr controller #29008

Closed RoyAnupam closed 3 years ago

RoyAnupam commented 3 years ago

Describe the bug Observation: LE Connection fails to establish between two nRF52840-USB Dongles Device Roles:

To Reproduce Steps to reproduce the behavior: Device A: Create GATT server using bluetoothctl tool (add a GATT server 0x1834, characteristic: 0x2834, Descriptor: 0x2902) Device B: Scan & connect Device A from bluetoothctl tool

Expected behavior Connection should not drop.

Impact Not able to proceed further tests

Logs and console output HCI Dump log in GATT Central Peripheral (Advertiser)

> HCI Event: LE Meta Event (0x3e) plen 19                                                                                                                                                                            #108 [hci0] 1243.692651
      LE Connection Complete (0x01)
        Status: Success (0x00)
        Handle: 0
        Role: Slave (0x01)
        Peer address type: Random (0x01)
        Peer address: F8:F4:B9:51:03:46 (Static)
        Connection interval: 50.00 msec (0x0028)
        Connection latency: 0 (0x0000)
        Supervision timeout: 420 msec (0x002a)
        Master clock accuracy: 0x07
@ MGMT Event: Device Connected (0x000b) plen 13                                                                                                                                                                  {0x0001} [hci0] 1243.692706
        LE Address: F8:F4:B9:51:03:46 (Static)
        Flags: 0x00000000
        Data length: 0
< HCI Command: LE Read Remote Used Features (0x08|0x0016) plen 2                                                                                                                                                     #109 [hci0] 1243.692810
        Handle: 0
> HCI Event: LE Meta Event (0x3e) plen 4                                                                                                                                                                             #110 [hci0] 1243.693564
      LE Channel Selection Algorithm (0x14)
        Handle: 0
        Algorithm: #2 (0x01)
> HCI Event: Command Status (0x0f) plen 4                                                                                                                                                                            #111 [hci0] 1243.694562
      LE Read Remote Used Features (0x08|0x0016) ncmd 1
        Status: Success (0x00)
@ RAW Open: hciconfig (privileged) version 2.22                                                                                                                                                                         {0x0002} 1243.697054
@ RAW Close: hciconfig                                                                                                                                                                                                  {0x0002} 1243.697066
> HCI Event: LE Meta Event (0x3e) plen 12                                                                                                                                                                            #112 [hci0] 1243.893602
      LE Read Remote Used Features (0x04)
        Status: Success (0x00)
        Handle: 0
        Features: 0xff 0x59 0x01 0x00 0x00 0x00 0x00 0x00
          LE Encryption
          Connection Parameter Request Procedure
          Extended Reject Indication
          Slave-initiated Features Exchange
          LE Ping
          LE Data Packet Length Extension
          LL Privacy
          Extended Scanner Filter Policies
          LE 2M PHY
          LE Coded PHY
          LE Extended Advertising
          Channel Selection Algorithm #2
          Minimum Number of Used Channels Procedure
= bluetoothd: src/device.c:load_gatt_db() No cache for F8:F4:B9:51:03:46                                                                                                                                                         1243.894308
< ACL Data TX: Handle 0 flags 0x00 dlen 7                                                                                                                                                                            #113 [hci0] 1243.894349
      ATT: Exchange MTU Request (0x02) len 2
        Client RX MTU: 517
> HCI Event: Number of Completed Packets (0x13) plen 5                                                                                                                                                               #114 [hci0] 1244.144576
        Num handles: 1
        Handle: 0
        Count: 1
> ACL Data RX: Handle 0 flags 0x02 dlen 9                                                                                                                                                                            #115 [hci0] 1244.144721
      ATT: Error Response (0x01) len 4
        Exchange MTU Request (0x02)
        Handle: 0x0000
        Error: Request Not Supported (0x06)
< ACL Data TX: Handle 0 flags 0x00 dlen 11                                                                                                                                                                           #116 [hci0] 1244.144950
      ATT: Read By Type Request (0x08) len 6
        Handle range: 0x0001-0xffff
        Attribute type: Unknown (0x2b3a)
> HCI Event: Number of Completed Packets (0x13) plen 5                                                                                                                                                               #117 [hci0] 1244.394578
        Num handles: 1
        Handle: 0
        Count: 1
> ACL Data RX: Handle 0 flags 0x02 dlen 9                                                                                                                                                                            #118 [hci0] 1244.394731
      ATT: Error Response (0x01) len 4
        Read By Type Request (0x08)
        Handle: 0x0000
        Error: Request Not Supported (0x06)
< ACL Data TX: Handle 0 flags 0x00 dlen 11                                                                                                                                                                           #119 [hci0] 1244.394869
      ATT: Read By Group Type Request (0x10) len 6
        Handle range: 0x0001-0xffff
        Attribute group type: Primary Service (0x2800)
> HCI Event: Number of Completed Packets (0x13) plen 5                                                                                                                                                               #120 [hci0] 1244.544561
        Num handles: 1
        Handle: 0
        Count: 1
> ACL Data RX: Handle 0 flags 0x02 dlen 9                                                                                                                                                                            #121 [hci0] 1244.544715
      ATT: Error Response (0x01) len 4
        Read By Group Type Request (0x10)
        Handle: 0x0000
        Error: Request Not Supported (0x06)
....
........
........
> HCI Event: Disconnect Complete (0x05) plen 4                                                                                                                                                                       #122 [hci0] 1287.642010
        Status: Success (0x00)
        Handle: 0
        Reason: Connection Timeout (0x08)
@ MGMT Event: Device Disconnected (0x000c) plen 8                                                                                                                                                                {0x0001} [hci0] 1287.642056
        LE Address: F8:F4:B9:51:03:46 (Static)
        Reason: Connection timeout (0x01)

Environment (please complete the following information):

  1. Host BT Stack: Bluez 5.55 (latest master branch)
  2. Host OS: Ubuntu 16.04 ( x86_64 x86_64 x86_64 GNU/Linux
  3. Host kernel: 4.15.0-041500-generic
  4. BT chip: nRF52840-USB dongle
  5. BT Controller Code: Latest Zephyr (master branch)
  6. Build: west build -b nrf52840dongle_nrf52840 zephyr/samples/bluetooth/hci_usb
  7. Flash: using nRF Connect v3.5.0 Windows Application

Additional context Observation: Sometimes connection drops with few seconds and sometimes in few minutes, without any host involvement or any command being sent to Controller from host. Also, it is observed that, the BLE connection between one nRF52840-USB dongle and another device (Android P) never terminates automatically by Connection Timeout, with nRF as GATT Server and Android as GATT client.

cvinayak commented 3 years ago

@RoyAnupam please provide the Zephyr commit hash you are building the hci_usb sample.

RoyAnupam commented 3 years ago

Hello @cvinayak Following is tip of my Zephyr git:- commit adb62ea290ee4d127c5134630489f64c43c252ef

BR,

RoyAnupam commented 3 years ago

Hello @cvinayak , I am currently blocked due to this Connection Drop issue described above. Please let me know if there is any additional info, which I can provide, which may help?

BR,

cvinayak commented 3 years ago

@RoyAnupam I have been busy. But I have tested one hci_usb nRF52840 dongle connected to my virtualbox with linux, and setup gatt services as stated by you. I do not see any connection supervision timeouts when I connect to it from a phone and when I connect back from Linux to phone, too.

Also, it is observed that, the BLE connection between one nRF52840-USB dongle and another device (Android P) never terminates automatically by Connection Timeout, with nRF as GATT Server and Android as GATT client.

Do you mean, hci_usb as a peripheral and a phone as central, you do not have connection loss?

Could you try with different nRF52840 dongles? (in case you may have damaged the crystals on your dongles).

You can contact me on slack and we can have a screen sharing, if you want me to debug live.

RoyAnupam commented 3 years ago

@cvinayak , I appreciate your time to test & respond about this issue.

Do you mean, hci_usb as a peripheral and a phone as central, you do not have connection loss?

Yes, I don't face any connection loss between hci_usb as peripheral & android phone as client & vice versa. Basically, when only one hci_usb(dongle) is involved and other device is commercial product like phone, there is no connection drop. But, whenever, I use hci_usb in both sides, connection drop issue is occurring.

Could you try with different nRF52840 dongles? (in case you may have damaged the crystals on your dongles).

Yes, I have tried with two different dongles, behavior is same. Could you get a chance to check the scenario with two hci_usb dongles once?

You can contact me on slack and we can have a screen sharing, if you want me to debug live.

I have never used slack, but will check if my company policy allows to use it for screen sharing. Thank You

cvinayak commented 3 years ago

I have tested connection between two dongles, I do not see any disconnections.

image

Please attach btmon logs for both the dongles, the exact commands you have issued in console and in bluetoothctl.

joerchan commented 3 years ago

@RoyAnupam Just an FYI, I edited your comment, you need a blank line after quoting someone.

> Quote

response
RoyAnupam commented 3 years ago

hello @cvinayak ,

I have tested connection between two dongles, I do not see any disconnections.

oh, I will check your btmon logs in detail and try to compare with mine. Thank you very much for sharing the logs. By the way, can you please share which version of bluez you are using in Ubuntu? bluetoothd --version? Although currently I am not sure, if that difference will make any difference.

cvinayak commented 3 years ago

bluetoothd --version?

You meant bluetoothctl --version bluetoothctl: 5.55

My bluez version is 5.55. BlueZ should not cause connection timeout.

If you can provide detailed step by step commandline or a script, it will make it easy to reproduce your issue. I am currently guessing on how you have setup your client and server etc. And I am using single virtual machine to connect between two controllers etc.

RoyAnupam commented 3 years ago

Hello @cvinayak

My bluez version is 5.55. BlueZ should not cause connection timeout.

Okay, sure

If you can provide detailed step by step commandline or a script, it will make it easy to reproduce your issue. I am currently guessing on how you have setup your client and server etc. And I am using single virtual machine to connect between two controllers etc.

Okay, sure. Let me work on this and share with you details.

RoyAnupam commented 3 years ago

Hello @cvinayak I am preparing detailed steps on each side with nRF dongle. I will be able to share very soon (with btmon logs, bluetoothctl steps & bluez logs on both sides)

From your description, if I understood correctly,,, you are using single PC with Virtual Box with two nRF dongles with hci_usb attached to the same PC?

In the mean time, just for your info, my setup is following. Setup1:

Setup2:

cvinayak commented 3 years ago

you are using single PC with Virtual Box with two nRF dongles with hci_usb attached to the same PC?

Yes. and I used select command in bluetoothctl to advertise on one controller and connect to from another.

RoyAnupam commented 3 years ago

Hello @cvinayak ,

Yes. and I used select command in bluetoothctl to advertise on one controller and connect to from another.

Okay, got it. I will also try this scenario at my end.

I reproduced the issue with controllers in separate Ubuntu Host's(with each running bluez upstream v5.55). I am attaching detailed steps and logs from both hosts (Client & Server)

Attachments contain following

I hope these information's will be useful to reproduce the scenario. By the way, I would also like to share hci_usb prj.cnf file changes. diff --git a/samples/bluetooth/hci_usb/prj.conf b/samples/bluetooth/hci_usb/prj.conf index 9a86ef3..009cf4a 100644 --- a/samples/bluetooth/hci_usb/prj.conf +++ b/samples/bluetooth/hci_usb/prj.conf @@ -5,6 +5,9 @@ CONFIG_UART_INTERRUPT_DRIVEN=y CONFIG_BT=y CONFIG_BT_HCI_RAW=y +CONFIG_BT_CTLR_ADV_EXT=y +CONFIG_BT_BROADCASTER=y +CONFIG_BT_CTLR_DATA_LENGTH_MAX=251 CONFIG_USB=y CONFIG_USB_DEVICE_STACK=y

Please let me know if any further information would be helpful. Thank You

cvinayak commented 3 years ago

@RoyAnupam Did you forget to upload files? (you can drag/drop or paste zip files here in the comments)

RoyAnupam commented 3 years ago

Hello @cvinayak , I am extremely sorry. Seems, file uploading got by default blocked due to some reason. I will check with IT and get back. Thank You very much for pointing it out.

carlescufi commented 3 years ago

@RoyAnupam you can also upload the files to dropbox or any other service and give us a link to them here.

RoyAnupam commented 3 years ago

@carlescufi , Yes, I figured it out. Actually, My company IT policy blocked uploading to github(firewall issue). Will fix it ASAP and upload the files. Requesting you to please allow me some time for it. Thank you very much!

RoyAnupam commented 3 years ago

Hello @cvinayak @carlescufi I think, it should work now. I am uploading the files.

Attachments contain following

Please let me know if you face any issue in accessing these files. Thanks

Linux_Client_bluetoothd.txt Linux_Client_btmon.txt Linux_Client_Setups.txt Linux_Peripheral_bluetoothd.txt Linux_Peripheral_btmon.txt Linux_Peripheral_Setups.txt

cvinayak commented 3 years ago

From Linux_Client_btmon.txt:

> ACL Data RX: Handle 0 flags 0x02 dlen 9                                                                                                                                                                            #1354 [hci0] 664.352961
      ATT: Write Request (0x12) len 4
        Handle: 0x0009
          Data: 0200
< ACL Data TX: Handle 0 flags 0x00 dlen 5                                                                                                                                                                            #1355 [hci0] 664.353123
      ATT: Write Response (0x13) len 0
> HCI Event: Number of Completed Packets (0x13) plen 5                                                                                                                                                               #1356 [hci0] 664.402772
        Num handles: 1
        Handle: 0
        Count: 1
> ACL Data RX: Handle 0 flags 0x02 dlen 9                                                                                                                                                                            #1357 [hci0] 664.402946
      Channel: 65 len 5 sdu 3 [PSM 39 mode LE Flow Control (0x80)] {chan 0}
      ATT: Read Request (0x0a) len 2
        Handle: 0x0003
< ACL Data TX: Handle 0 flags 0x00 dlen 12                                                                                                                                                                           #1358 [hci0] 664.403095
      Channel: 65 len 8 sdu 6 [PSM 39 mode LE Flow Control (0x80)] {chan 0}
      ATT: Read Response (0x0b) len 5
        Value: 6162686179       

From Linux_Peripheral_btmon.txt:

< ACL Data TX: Handle 0 flags 0x00 dlen 9                                                                                                                                                                             #89 [hci0] 2805.994446
      ATT: Write Request (0x12) len 4
        Handle: 0x0009
          Data: 0200
< ACL Data TX: Handle 0 flags 0x00 dlen 9                                                                                                                                                                             #90 [hci0] 2805.994447
      Channel: 65 len 5 sdu 3 [PSM 39 mode LE Flow Control (0x80)] {chan 0}
      ATT: Read Request (0x0a) len 2
        Handle: 0x0003
> HCI Event: Number of Completed Packets (0x13) plen 5                                                                                                                                                                #91 [hci0] 2806.356975
        Num handles: 1
        Handle: 0
        Count: 1
> ACL Data RX: Handle 0 flags 0x02 dlen 7                                                                                                                                                                             #92 [hci0] 2806.357048
      Channel: 64 len 3 [PSM 0 mode Basic (0x00)] {chan 65535}
        01 00 13                                         ...
< ACL Data TX: Handle 0 flags 0x00 dlen 9                                                                                                                                                                             #93 [hci0] 2806.357178
      Channel: 64 len 5 [PSM 0 mode Basic (0x00)] {chan 65535}
        03 00 0a 05 00                                   .....
> HCI Event: Number of Completed Packets (0x13) plen 5                                                                                                                                                                #94 [hci0] 2806.406966
        Num handles: 1
        Handle: 0
        Count: 1
> ACL Data RX: Handle 0 flags 0x02 dlen 5                                                                                                                                                                             #95 [hci0] 2806.407058
      ATT: Write Response (0x13) len 0
< ACL Data TX: Handle 0 flags 0x00 dlen 7                                                                                                                                                                             #96 [hci0] 2806.407215
      ATT: Read Request (0x0a) len 2
        Handle: 0x0012
> HCI Event: Number of Completed Packets (0x13) plen 5                                                                                                                                                                #97 [hci0] 2806.456969
        Num handles: 1
        Handle: 0
        Count: 1
> ACL Data RX: Handle 0 flags 0x02 dlen 12                                                                                                                                                                            #98 [hci0] 2806.457037
      Channel: 65 len 8 sdu 6 [PSM 39 mode LE Flow Control (0x80)] {chan 0}
      ATT: Read Response (0x0b) len 5
        Value: 6162686179

The last ATT Read Request handle (0x0012 sent, and 0x0003 received) do not match, are you sure this log is from a connection between the two dongles?

RoyAnupam commented 3 years ago

Hello @cvinayak ,

are you sure this log is from a connection between the two dongles?

Yes, I cross-checked, the logs are from two dongles running on Bluez 5.55 host of two separate Ubuntu Machines.

I have captured logs again. This time, along with btmon logs, I am also sharing cfa files which can be directly viewed in Frontline's Comprobe Protocol Analyzer system (CPAS) file viewer software.

Below are the Random addresses of the dongles:-

Client connection initiation (can be found in Client_btmon.txt) < HCI Command: LE Create Connection (0x08|0x000d) plen 25 #2776 [hci0] 1634.967793 Scan interval: 60.000 msec (0x0060) Scan window: 60.000 msec (0x0060) Filter policy: White list is not used (0x00) Peer address type: Random (0x01) Peer address: C9:82:D7:F4:00:EF (Static) Own address type: Random (0x01) Min connection interval: 30.00 msec (0x0018) Max connection interval: 50.00 msec (0x0028) Connection latency: 0 (0x0000) Supervision timeout: 420 msec (0x002a) Min connection length: 0.000 msec (0x0000) Max connection length: 0.000 msec (0x0000)

Peripheral connection Complete (can be found in Peripheral_btmon.txt)

HCI Event: LE Meta Event (0x3e) plen 19 #2710 411376.250074 LE Connection Complete (0x01) Status: Success (0x00) Handle: 0 Role: Slave (0x01) Peer address type: Random (0x01) Peer address: F8:F4:B9:51:03:46 (Static) Connection interval: 50.00 msec (0x0028) Connection latency: 0 (0x0000) Supervision timeout: 420 msec (0x002a) Master clock accuracy: 0x07

Please find attached the btmon and cfa logs Client_btmon.txt Peripheral_btmon.txt client_cfa.log peripheral_cfa.log

carlescufi commented 3 years ago

@RoyAnupam could you please try to capture a sniffer trace using the Nordic sniffer, if you have spare Nordic Development Kit. You can find the instructions here:

https://infocenter.nordicsemi.com/index.jsp?topic=%2Fug_sniffer_ble%2FUG%2Fsniffer_ble%2Fintro.html&cp=9_4

carlescufi commented 3 years ago

@cvinayak FYI this is Connection Timeout (0x08) on both sides, so likely a controller issue.

RoyAnupam commented 3 years ago

Hello @carlescufi ,

@RoyAnupam could you please try to capture a sniffer trace using the Nordic sniffer, if you have spare Nordic Development Kit. You can find the instructions here:

Unfortunately, I don't have a Nordic Sniffer. I am sorry about it. However, I did try to use a Frontline Comprobe Air Sniffer. In the logs, I remember, there was not much communication happening After CONN_IND PDU is sent by the initiator.

cvinayak commented 3 years ago

Please give a try with the follow in prj.conf of the hci_usb sample. CONFIG_CLOCK_CONTROL_NRF_K32SRC_RC=y CONFIG_CLOCK_CONTROL_NRF_K32SRC_500PPM=y

RoyAnupam commented 3 years ago

Hello @cvinayak ,

Please give a try with the follow in prj.conf of the hci_usb sample. CONFIG_CLOCK_CONTROL_NRF_K32SRC_RC=y CONFIG_CLOCK_CONTROL_NRF_K32SRC_500=y

Okay sure, will try it right away. by the way, following is my current proj. Please let me know if there is any additional changes. Thank You

CONFIG_STDOUT_CONSOLE=y CONFIG_GPIO=y CONFIG_SERIAL=y CONFIG_UART_INTERRUPT_DRIVEN=y

CONFIG_BT=y CONFIG_BT_HCI_RAW=y CONFIG_BT_CTLR_ADV_EXT=y CONFIG_BT_BROADCASTER=y CONFIG_BT_CTLR_DATA_LENGTH_MAX=251

CONFIG_USB=y CONFIG_USB_DEVICE_STACK=y CONFIG_USB_DEVICE_BLUETOOTH=y CONFIG_USB_DEVICE_BLUETOOTH_VS_H4=y

++ CONFIG_CLOCK_CONTROL_NRF_K32SRC_RC=y ++ CONFIG_CLOCK_CONTROL_NRF_K32SRC_500=y

RoyAnupam commented 3 years ago

Facing following build error. Any clue on what might be going wrong? I have git pulled to latest. Following is the tip of my git - ec0e737b0c7c0d19fcb9733275ab8bb2743b828e

sri@sri-Samsung-DeskTop-System:~/work/code/newzephyrproject$ west build -b nrf52840dongle_nrf52840 zephyr/samples/bluetooth/hci_usb -- west build: generating a build system Including boilerplate (Zephyr base): /home/sri/work/code/newzephyrproject/zephyr/cmake/app/boilerplate.cmake -- Application: /home/sri/work/code/newzephyrproject/zephyr/samples/bluetooth/hci_usb -- Zephyr version: 2.4.99 (/home/sri/work/code/newzephyrproject/zephyr) -- Found Python3: /usr/local/bin/python3.6 (found suitable exact version "3.6.10") found components: Interpreter -- Found west (found suitable version "0.7.3", minimum required is "0.7.1") -- Board: nrf52840dongle_nrf52840 -- Cache files will be written to: /home/sri/.cache/zephyr -- Found toolchain: zephyr (/home/sri/work/code/zephyr-sdk) -- Found dtc: /home/sri/work/code/zephyr-sdk/sysroots/x86_64-pokysdk-linux/usr/bin/dtc (found suitable version "1.5.0", minimum required is "1.4.6") -- Found BOARD.dts: /home/sri/work/code/newzephyrproject/zephyr/boards/arm/nrf52840dongle_nrf52840/nrf52840dongle_nrf52840.dts -- Generated zephyr.dts: /home/sri/work/code/newzephyrproject/build/zephyr/zephyr.dts -- Generated devicetree_unfixed.h: /home/sri/work/code/newzephyrproject/build/zephyr/include/generated/devicetree_unfixed.h Parsing /home/sri/work/code/newzephyrproject/zephyr/samples/bluetooth/hci_usb/Kconfig Loaded configuration '/home/sri/work/code/newzephyrproject/zephyr/boards/arm/nrf52840dongle_nrf52840/nrf52840dongle_nrf52840_defconfig' Merged configuration '/home/sri/work/code/newzephyrproject/zephyr/samples/bluetooth/hci_usb/prj.conf'

warning: STDOUT_CONSOLE (defined at lib/libc/Kconfig:124) was assigned the value 'y' but got the value 'n'. Check these unsatisfied dependencies: CONSOLE_HAS_DRIVER (=n). See http://docs.zephyrproject.org/latest/reference/kconfig/CONFIG_STDOUT_CONSOLE.html and/or look up STDOUT_CONSOLE in the menuconfig/guiconfig interface. The Application Development Primer, Setting Configuration Values, and Kconfig - Tips and Best Practices sections of the manual might be helpful too.

/home/sri/work/code/newzephyrproject/zephyr/samples/bluetooth/hci_usb/prj.conf:18: warning: attempt to assign the value 'y' to the undefined symbol CLOCK_CONTROL_NRF_K32SRC_500

error: Aborting due to Kconfig warnings

CMake Error at /home/sri/work/code/newzephyrproject/zephyr/cmake/kconfig.cmake:239 (message): command failed with return code: 1 Call Stack (most recent call first): /home/sri/work/code/newzephyrproject/zephyr/cmake/app/boilerplate.cmake:590 (include) /home/sri/work/code/newzephyrproject/zephyr/share/zephyr-package/cmake/ZephyrConfig.cmake:24 (include) /home/sri/work/code/newzephyrproject/zephyr/share/zephyr-package/cmake/ZephyrConfig.cmake:35 (include_boilerplate) CMakeLists.txt:4 (find_package)

-- Configuring incomplete, errors occurred! FATAL ERROR: command exited with status 1: /home/sri/work/code/cmake_dir/cmake-3.18.0-Linux-x86_64/bin/cmake -DWEST_PYTHON=/usr/local/bin/python3.6 -B/home/sri/work/code/newzephyrproject/build -S/home/sri/work/code/newzephyrproject/zephyr/samples/bluetooth/hci_usb -GNinja -DBOARD=nrf52840dongle_nrf52840 sri@sri-Samsung-DeskTop-System:~/work/code/newzephyrproject$

cvinayak commented 3 years ago

Sorry, I made a typo, use the below: CONFIG_CLOCK_CONTROL_NRF_K32SRC_500PPM=y

RoyAnupam commented 3 years ago

Hello @cvinayak , I tried testing couple of time with suggested change, however, still same issue is occurring. I have attached "btmon" logs for your reference.

Client's Connection initiation < HCI Command: LE Create Connection (0x08|0x000d) plen 25 #4372 [hci0] 293.407975 Scan interval: 60.000 msec (0x0060) Scan window: 60.000 msec (0x0060) Filter policy: White list is not used (0x00) Peer address type: Random (0x01) Peer address: C9:82:D7:F4:00:EF (Static) Own address type: Random (0x01) Min connection interval: 30.00 msec (0x0018) Max connection interval: 50.00 msec (0x0028) Connection latency: 0 (0x0000) Supervision timeout: 420 msec (0x002a) Min connection length: 0.000 msec (0x0000) Max connection length: 0.000 msec (0x0000)

Peripheral's LE Connection Complete

HCI Event: LE Meta Event (0x3e) plen 19 #47 [hci0] 553.684238 LE Connection Complete (0x01) Status: Success (0x00) Handle: 0 Role: Slave (0x01) Peer address type: Random (0x01) Peer address: F8:F4:B9:51:03:46 (Static) Connection interval: 50.00 msec (0x0028) Connection latency: 0 (0x0000) Supervision timeout: 420 msec (0x002a) Master clock accuracy: 0x00

Linux_client_btmon.txt Linux_peripheral_btmon.txt

cvinayak commented 3 years ago

I am not able to reproduce connection timeout for hours of having two linux laptops connected using hci_usb nRF52840 dongles. You are definitely having some L2CAP traffic (not that it would matters) and we are not testing the same. Unless you provide me detailed steps to reproduce the issue, no further action can be taken.

RoyAnupam commented 3 years ago

Hello @cvinayak,

Provide a branch with any local changes committed, that I can build the hci_usb application from

I have not made any local changes to my git project & I am using just the master branch. Please see the outputs as follows.

Provide commandline steps

I had provided detailed steps which I perform on both sides in previous thread. I am sharing again for your reference. It contains step by step instructions using bluetoothctl tool (which is used to start GATT server & a GATT client in Ubuntu hosts) I have attached Linux_Client_Setups.txt & Linux_Peripheral_Setups.txt files.

You are definitely having some L2CAP traffic (not that it would matters) and we are not testing the same.

Okay, I am using bluez latest v5.55 for your kind information & I have noticed EATT (Enhanced ATT bearer) channel being created which I think is a new implementation in bluez upstream. I am not sure, if this particular L2CAP traffic can create Disconnection Issue. Is EATT traffic existing at your side..? bluetoothd[21322]: src/gatt-database.c:connect_cb() New incoming LE ATT connection bluetoothd[21322]: src/device.c:device_attach_att() EATT channel connected

Regards, -Anupam Linux_Client_Setups.txt Linux_Peripheral_Setups.txt

cvinayak commented 3 years ago

ok... I will test once again, hope I reproduce the connection loss and get back by tomorrow.

RoyAnupam commented 3 years ago

Hello @cvinayak

ok... I will test once again, hope I reproduce the connection loss and get back by tomorrow.

Sure, thanks. Additionally, I build bluez upstream from source - https://git.kernel.org/pub/scm/bluetooth/bluez.git and launch daemon manually -> $bluez/src$sudo ./bluetoothd -nd

Also, I launch bluetoothctl tool from source ->bluez/client$ sudo ./bluetoothctl

cvinayak commented 3 years ago

client1.txt client1.zip peripheral.txt peripheral.zip

I am not able to reproduce the connection loss. I have had the dongles connected for couple hours. In the attached btsnoop logs, I explicitly disconnect and reconnect couple times to log few minutes of sustained connections. Above are the btsnoop files (created using btmon -w <filename>).

Please use two different dongles to rule out faulty hardware at your side.

RoyAnupam commented 3 years ago

Hello @cvinayak , Thank for your test & sharing the results. By the way, in your test setup, are you using bluez v5.55 host with latest upstream changes? In anyways, I will try to find different hardware and check the scenario again, as suggested by you. Also, I want to check the logs shared by you in more details. But, unfortunately, I am currently on vacation till 23rd Nov. I will be able to check the issue after resuming work,,, sorry about it.

Regards, -Anupam

cvinayak commented 3 years ago

are you using bluez v5.55 host with latest upstream changes?

I use v5.55 from the archlinux distribution, it should be the stable version. Host stack should not cause connection stability issues.

RoyAnupam commented 3 years ago

Hello @cvinayak , First of all, apologies for delay in response (due to my vacation)!

Please use two different dongles to rule out faulty hardware at your side.

Today I have taken two brand new nRF52840 USB dongle and just performed a test with exact same setups. I like to share the quick results. Also I would like to share the hardware information's for my old & current tests.

Old Dongle Information as found in back side [used in all tests done till this time] Dongle 1: | 1.0.0 | PCA10059 | 2018.42 | F1:98:49:DC:7F:2F Dongle 2: | 1.0.0 | PCA10059 | 2018.42 | D4:F7:C8:2E:2B:07

New Dongle Information as found in back side [Used in Current tests] Dongle 1: | 1.2.0 | PCA10059 | 2019.42 | E0:99:1B:44:72:C8 Dongle 2: | 1.2.0 | PCA10059 | 2019.42 | E7:58:98:4E:67:7D

It seems, after connection request is sent to Controller in GATT client side, connection is terminated with Reason: Connection Failed to be Established (0x3e).

In GATT server end (other Ubuntu PC), there is no HCI event or any other event from controller

I am sharing the btmon logs from GATT client side. I am not sure, if this is an environment issue, but I will definitely do some more tests to at-least progress with the connection.

File: GATT Client BT Mon Logs.txt

BR, -Anupam Roy

RoyAnupam commented 3 years ago

Hello @cvinayak , Further to my previous email, I wish to inform that, connection did proceed, when i did further tests, but failed with Connection Timeout, same as before. In order to rule out chances of faulty hardware, I used two brand new dongles(version: 1.2.0, 2019.42) , as mentioned in previous email, but unfortunately, new dongles are not making any difference in results.

I am sharing btmon & bluetoothd logs for your reference.

Please suggest if I could try different tests/steps to check this issue. Thank You very much~

Additional Observation: I also tested GATT connection scenario between one nRF52840-USB dongle (GATT server) and another Android Mobile Device with nRFConnect Application(GATT client). Connection is stable and never terminates automatically by Connection Timeout, I tested multiple times and no issue is observed. Logs of BTMON GATT server - btmon_nRF_52840_Server_with_AndroidClient.txt

Android Mobile BT MAC - 59:D6:75:CA:20:F1

BR, -Anupam Roy

cvinayak commented 3 years ago

@RoyAnupam Please use nRF sniffer and capture the on-air traffic for the connection, this will help me analyze if there are any scheduling issues in the controller.

@carlescufi As this issue is not reproducible on my end, I request another resource try to reproduce the issue and I can then have a live session with them to analyze further.

RoyAnupam commented 3 years ago

Hello @cvinayak ,

Please use nRF sniffer and capture the on-air traffic for the connection, this will help me analyze if there are any scheduling issues in the controller.

Oh, Unfortunately I don't have nRF sniffer with me. Actually I have not used it before. Can you please tell me if it is a hardware/software of HW + S/W combination tool? or just any guide/weblink for reference? Thank You.

cvinayak commented 3 years ago

@RoyAnupam Here you go: https://www.nordicsemi.com/Software-and-tools/Development-Tools/nRF-Sniffer-for-Bluetooth-LE

RoyAnupam commented 3 years ago

Hello @cvinayak , Thank you very much for sharing the guide. I am currently facing some issue in getting the sniffer hardware(nRF52840 dongle) detected in the Wireshark hardware interface. I am working on the same. Hopefully will be resolved soon.

cvinayak commented 3 years ago

@RoyAnupam check the path to python, if you have multiple python versions installed. Test executing the script in the extcap folder to check if you can run the python script to enumerate the ext interfaces.

I see that have these in your prj.conf: +CONFIG_BT_CTLR_ADV_EXT=y +CONFIG_BT_BROADCASTER=y +CONFIG_BT_CTLR_DATA_LENGTH_MAX=251

Are you building the vanilla controller from the upstream master repository? if not, please use the default hci_usb sample and compare the sniffer logs (once you get wireshark working).

RoyAnupam commented 3 years ago

Hello @cvinayak , Thanks for sharing your opinion~ I have got nRF sniffer Hardware interface detected in Wireshark (I am using Version 3.4.0 (v3.4.0-0-g9733f173ea5e) & sniffing is also starting to work it seems. However, Is there any recommended setting to filter traffic between only the concerned controllers? Currently, it is flooded with ADV & SCAN packets from all nearby devices.

cvinayak commented 3 years ago

@RoyAnupam select in the Wireshark menus->view->Interface Toolbars-> "nRF Sniffer for Bluetooth LE"

Now you have a toolbar with "Device" as "All advertising devices", drop this down and select the advertising device you which to capture the connection for.

RoyAnupam commented 3 years ago

Hello @cvinayak , Thanks, it works now~ Now, I can see Empty PDU's being exchanged between master & slave devices in the Wireshark. I have attached it for your reference. Please let me know if you are expecting same data from Sniffer tool capture for our current Disconnection issue?

nRFSniffer

cvinayak commented 3 years ago

@RoyAnupam please attach the sniffer log file with information from connection to disconnection. From the screenshot of few seconds of connection, there is nothing much I can analyze. Did you get the disconnection with supervision timeout (packets stop in the sniffer after only packet from master for the duration of supervision timeout)?

RoyAnupam commented 3 years ago

Hello @cvinayak My intention of sharing the screenshot was to just confirm with you if you are looking for similar log from wireshark sniffer dump. I will certainly test and share the whole log file. Sorry for confusion!

While I was testing yesterday, I got disconnection once after around 3-4~ mins of connection, but at that time, wireshark filter settings was not correct, so unfortunately that log is not useful. I will test again and share my observation and correct sniffer log.

RoyAnupam commented 3 years ago

Hello @cvinayak , The issue is reproduced with correct Wireshark setting this time. But, facing one issue with uploading the zipped dump file (6 MB zip file). I am trying to figure out what is going wrong. In the mean time, lease find below some observations.

Did you get the disconnection with supervision timeout (packets stop in the sniffer after only packet from master for the duration of supervision timeout)?

Yes, the last packet was from Master at time : 14399.221, after that, there was nothing from Slave, due to which Connection terminated. I can see Peripheral device switched to Advertisement state after connection was terminated at time:14401.047. Anyways, please refer screenshot for the same. DisConnection_From_Supervision_Timeout

cvinayak commented 3 years ago

It just looks like Peripheral did a proper local initiated disconnect to which the Central acknowledged in packet no. 476411, except that the Info says L2CAP Fragment (is the connection encrypted?). Without the sniffer logs with details to look at, I cant be certain.... Can you email me the sniffer log zip file? My email address is in any of controller commit messages in Zephyr.

RoyAnupam commented 3 years ago

DC_Latest.zip

Hello @cvinayak , Please find attached the complete sniffer log file.