zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.86k stars 6.62k forks source link

Bluetooth: conn: in multi role configuration incorrect address is set after advertising resume #52059

Closed ppryga-nordic closed 1 year ago

ppryga-nordic commented 2 years ago

Describe the bug Advertising resume in Host is limited operation in comparison to regular advertising enable operation. Limited in a way, there is no device address check/update before the advertising is resumed. When scanner is resumed and identity address is not used, Host uses NRPA address for a device when scanning. This address is different than static random address. When connection in peripheral role is disconnected, Host resumes advertising and continues use of address set by scan resume.

Now, assume that an application starts a connection in central role. Both functions bt_le_create_conn_ext and bt_le_create_conn_legacy call bt_id_set_create_conn_own_addr. The last one mentioned function attempts to change an address to static random address: https://github.com/zephyrproject-rtos/zephyr/blob/e104e031b7d3ddacd2b90527fff438fc4056b72a/subsys/bluetooth/host/id.c#L1519-L1528. Unfortunately that will not happen, because there is an advertising running and Controller returns error BT_HCI_ERR_CMD_DISALLOWED for Host command: BT_HCI_OP_LE_SET_RANDOM_ADDRESS.

It looks to me that, to fix the issue we need to run Bluetooth address setup before advertising is resumed. Unfortunately that would require storage of some information that are provided to Host when advertising is started by bt_le_adv_start.

To Reproduce Steps to reproduce the behavior: The issue may be reproduced according to steps in #50438.

Expected behavior It should be possible to established a connection in central even though there is an advertising pending in the background.

Impact Not sure how sever is the issue and what impact it has on end users.

Additional context The issue was found during investigation of #50438.

jori-nordic commented 1 year ago

doesn't seem to happen any more. I made a bsim test to reproduce it, but the DUT still uses the identity address (the static random you talk about I think) when re-starting advertising.

jon@jori-pc:~/repos/zephyrproject/zephyr/tests/bluetooth/bsim_bt/bsim_test_adv_resume$ test_scripts/_compile.sh && test_scripts/run_test.sh 
[3/6] Linking C executable zephyr/zephyr_pre0.elf

[6/6] Linking C executable zephyr/zephyr.elf

d_01: @00:00:00.000000  *** Booting Zephyr OS build zephyr-v3.2.0-2350-ga738d820c800 ***
d_00: @00:00:00.000000  *** Booting Zephyr OS build zephyr-v3.2.0-2350-ga738d820c800 ***
d_00: @00:00:00.000000  [00:00:00.000,000] <inf> bt_hci_core: hci_vs_init: HW Platform: Nordic Semiconductor (0x0002)
d_00: @00:00:00.000000  [00:00:00.000,000] <inf> bt_hci_core: hci_vs_init: HW Variant: nRF52x (0x0002)
d_00: @00:00:00.000000  [00:00:00.000,000] <inf> bt_hci_core: hci_vs_init: Firmware: Standard Bluetooth controller (0x00) Version 3.2 Build 99
d_01: @00:00:00.000000  [00:00:00.000,000] <inf> bt_hci_core: hci_vs_init: HW Platform: Nordic Semiconductor (0x0002)
d_01: @00:00:00.000000  [00:00:00.000,000] <inf> bt_hci_core: hci_vs_init: HW Variant: nRF52x (0x0002)
d_01: @00:00:00.000000  [00:00:00.000,000] <inf> bt_hci_core: hci_vs_init: Firmware: Standard Bluetooth controller (0x00) Version 3.2 Build 99
d_00: @00:00:00.000000  [00:00:00.000,000] <wrn> bt_id: bt_read_static_addr: No static addresses stored in controller
d_01: @00:00:00.000000  [00:00:00.000,000] <wrn> bt_id: bt_read_static_addr: No static addresses stored in controller
d_01: @00:00:00.002648  [00:00:00.002,624] <inf> bt_hci_core: bt_dev_show_info: Identity: ED:71:8F:C2:E4:6E (random)
d_00: @00:00:00.002648  [00:00:00.002,624] <inf> bt_hci_core: bt_dev_show_info: Identity: ED:3B:20:15:18:12 (random)
d_01: @00:00:00.002648  [00:00:00.002,624] <inf> bt_hci_core: bt_dev_show_info: HCI: version 5.3 (0x0c) revision 0x0000, manufacturer 0x05f1
d_01: @00:00:00.002648  [00:00:00.002,624] <inf> bt_hci_core: bt_dev_show_info: LMP: version 5.3 (0x0c) subver 0xffff
d_00: @00:00:00.002648  [00:00:00.002,624] <inf> bt_hci_core: bt_dev_show_info: HCI: version 5.3 (0x0c) revision 0x0000, manufacturer 0x05f1
d_01: @00:00:00.002648  DUT start
d_00: @00:00:00.002648  [00:00:00.002,624] <inf> bt_hci_core: bt_dev_show_info: LMP: version 5.3 (0x0c) subver 0xffff
d_00: @00:00:01.007905  start scanner
d_00: @00:00:01.007905  [00:00:01.007,904] <err> bt_id: bt_id_set_scan_own_addr: use NRPA
d_00: @00:00:01.007905  [00:00:01.007,904] <err> bt_id: bt_id_set_private_addr: NRPA: 00:DA:63:54:3C:CD
d_01: @00:00:01.007905  start scanner
d_01: @00:00:01.007905  [00:00:01.007,904] <err> bt_id: bt_id_set_scan_own_addr: use NRPA
d_01: @00:00:01.007905  [00:00:01.007,904] <err> bt_id: bt_id_set_private_addr: NRPA: 1B:81:4B:D3:9D:9A
d_01: @00:00:01.117676  start advertiser
d_00: @00:00:01.331001  Got scan result, connecting.. dst ED:71:8F:C2:E4:6E (random), RSSI -35
d_00: @00:00:01.367157  connected
d_01: @00:00:01.357178  connected
d_01: @00:00:01.357178  DUT is peripheral
d_01: @00:00:01.394355  Got scan result, connecting.. dst E7:70:AE:95:2C:FA (random), RSSI -35
d_00: @00:00:01.427033  connected
d_01: @00:00:01.437012  connected
d_01: @00:00:01.437012  DUT is central & peripheral
d_00: @00:00:06.426636  disconnect peripheral
d_00: @00:00:06.426636  Waiting for disconnection...
d_01: @00:00:06.526429  disconnected
d_01: @00:00:06.526429  DUT is central
d_00: @00:00:11.526032  disconnect central
d_00: @00:00:11.526032  Waiting for disconnection...
d_01: @00:00:11.575928  disconnected
d_01: @00:00:11.575928  DUT has no connections
d_00: @00:00:16.565552  start scanner
d_00: @00:00:16.565552  [00:00:16.565,551] <err> bt_id: bt_id_set_scan_own_addr: use NRPA
d_00: @00:00:16.565552  [00:00:16.565,551] <err> bt_id: bt_id_set_private_addr: NRPA: 29:05:AD:ED:EA:06
d_00: @00:00:16.575802  Got scan result, connecting.. dst ED:71:8F:C2:E4:6E (random), RSSI -35
d_00: @00:00:16.575803  [00:00:16.575,775] <wrn> bt_conn: bt_conn_exists_le: Found valid connection in disconnected state
d_00: @00:00:16.575803  main: The TESTCASE FAILED with return code 2
d_00: @00:00:16.575803 ERROR: (WEST_TOPDIR/zephyr/tests/bluetooth/bsim_bt/bsim_test_adv_resume/src/bs_bt_utils.c:93): Err bt_conn_le_create -22d_01: @00:00:16.575664  main: The TESTCASE FAILED with return code 1
ppryga-nordic commented 1 year ago

I would suggest to see in code to check if there were changes about it. In my humble opinion it would have been fixed if one changed advertising resume implementation, but I'm not following PRs in Host now.

jori-nordic commented 1 year ago

I just reproduced it again, using the steps in the parent issue. I'll dig a bit more, see if downstream is missing any patches or I've missed something in the bsim test.

edit: got it:

The difference is that the SDC behaves differently than the ZLL: When the first connection (as a peripheral) happens, the host tries to resume advertising. The SDC will refuse to do so, while ZLL will allow it. When I removed that resume in the bsim test, I was able to reproduce it.

I think the SDC refusing to do so is because it has reserved memory for peripherals and centrals. If the peripheral memory pool is full (ie 1 slot here), then it can't create a connectable advertiser, and will respond to the HCI cmd with a 'refused'. It seems the ZLL doesn't have this concept. We shouldn't rely on that controller behavior for the central_and_peripheral_hr sample IMO, we should rather make an API to reserve/alloc connection objects in advance (be it for central or peripheral) in the host.