Closed Klievan closed 1 year ago
can confirm I am able to reproduce it (nrf connect on android phone), will look into it asap. edit: it looks like it's only the dk-to-dk connection that fails (ie w/zephyr as central). I am able to reconnect thru nrfconnect, I just have to re-scan and hit the connect button.
it looks like it's only the dk-to-dk connection that fails (ie w/zephyr as central). I am able to reconnect thru nrfconnect, I just have to re-scan and hit the connect button.
Can confirm, my apologies. In my application it takes a while but is not impossible to reconnect from the iPhone (as far as I can tell because the device is busy trying to reconnect to other devices).
I've narrowed it down, I think it's an issue with the controller rather than the host, as bt_conn_le_create
returns -EIO
which is an HCI error. Still digging..
I've narrowed it down, I think it's an issue with the controller rather than the host, as
bt_conn_le_create
returns-EIO
which is an HCI error. Still digging..
I've looked up the HCI warning in the BLE spec and it gave me these 'translations': bt_hci_core: opcode 0x2005 status 0x0c 0x2005 for the BT_HCI_OP_LE_CREATE_CONN command, and 0x0C should mean "NOT ALLOWED". Not sure if that helps you out in any way?
Thanks, yeah I know it's not allowed (you can check out hci_err.h
in zephyr it has the list).
I think this requires more digging in the softdevice controller, it looks like a bug there. I tried to make it work with the zephyr LL but there were some issues, I'll try again to double-check.
The same opcodes are sent, in the same order for both cases (scanning+connecting after boot, scanning+connecting after both devices disconnected), but in the last case, the BT_HCI_OP_LE_CREATE_CONN
command is the only one returning DISALLOWED, probably a bug in their state machine.
okay, it works with the zephyr link layer (no bug) with these changes in the prj.conf
:
CONFIG_BT_LL_SOFTDEVICE=n
CONFIG_BT_LL_SW_SPLIT=y
CONFIG_BT_CTLR_RX_PRIO_STACK_SIZE=1024
CONFIG_BT_MAX_CONN=10
So this seems to be a bug in the Softdevice Controller. could you open a bug in the devzone and link it here? I'll close this issue. I have already pinged the controller team, but they'll need the ticket number that's why.
Just as a last comment, I've used the Zephyr LL as well and the bug also seems to disappear on my end.
could you open a bug in the devzone and link it here? I'll close this issue.
I've found that there actually are two issues in Host that make the incorrect behavior as described above.
bt_le_create_conn_ext
and bt_le_create_conn_legacy
call bt_id_set_create_conn_own_addr
. The last one mentioned function attempts to change an address to static random address: https://github.com/zephyrproject-rtos/zephyr/blob/e104e031b7d3ddacd2b90527fff438fc4056b72a/subsys/bluetooth/host/id.c#L1519-L1528. Unfortunately that will not happen, because there is an advertising running and Controller returns error BT_HCI_ERR_CMD_DISALLOWED
for Host command: BT_HCI_OP_LE_SET_RANDOM_ADDRESS
.It looks to me that, to fix second issue we need to run Bluetooth address setup before advertising is resumed. Unfortunately that would require storage of some information that are provided to Host when advertising is started by bt_le_adv_start
.
@jori-nordic @Thalley @alwa-nordic @sjanc what is your opinion?
@jori-nordic @Thalley @alwa-nordic @sjanc what is your opinion?
Sounds like you've successfully fixed the first issue.
For the second issue, this is only an issue if BT_EXT_ADV=n
, right? For BT_EXT_ADV=y
BT_HCI_OP_LE_SET_RANDOM_ADDRESS
can be called, even when advertising:
If the Host issues this command when any of advertising (created using legacy advertising commands), scanning, or initiating are enabled, the Controller shall return the error code Command Disallowed (0x0C).
We could for BT_EXT_ADV=n
simply reject calls to bt_conn_le_create
if we are advertising or scanning, or vice versa reject calls to bt_le_adv_start
if we are currently scanning or connecting.
@Thalley yes, first issue is fixed.
Referring to second one and you comment.
In my humble opinion rejection of bt_conn_le_create
is last option. The issue happens only when a peripheral connection is closed when scanning is in progress. In other words, when Host resumes advertising for a closed peripheral role connection.
If user re-enables advertising, there should be no problem because an address should be changed to static random before host requests Controller to start advertising again.
I would prefer to extend advertising resume operation instead rejecting anything. An application should be able to use the API in the same way as it was after BT stack enable.
Right. Rejecting API calls won't help in the resume case either, as that is done internally in the stack, and the application won't be able to handle any issues there anyways.
Can anyone else share his thoughts about it?
@carlescufi re-openning this is not completed. There is still an issue with BT adresses when resume advertising. I've described it here: https://github.com/zephyrproject-rtos/zephyr/issues/50438#issuecomment-1277621789
It looks to me that, to fix second issue we need to run Bluetooth address setup before advertising is resumed. Unfortunately that would require storage of some information that are provided to Host when advertising is started by bt_le_adv_start.
I agree.
I've found that there actually are two issues in Host that make the incorrect behavior as described above.
- Host automatically resumes advertising when peripheral role is enabled. It does not check role stored in connection object hence may starve central role. In described scenario, Host resumes advertising while an application resumes scanning. I've prepared a PR for this case: Bluetooh: Host: conn: Resume adv after disconn for periph role only #51256.
- Advertising resume in Host is limited operation in comparison to regular advertising enable operation. Limited in a way, there is no device address check/update before the advertising is resumed. When scanner is resumed and identity address is not used, Host uses NRPA address for a device when scanning. This is address is different than static random address. When connection in peripheral role is disconnected, Host resumes advertising and continues use of address set by scan resume. Now, assume that an application starts a connection in central role. Both functions
bt_le_create_conn_ext
andbt_le_create_conn_legacy
callbt_id_set_create_conn_own_addr
. The last one mentioned function attempts to change an address to static random address: https://github.com/zephyrproject-rtos/zephyr/blob/e104e031b7d3ddacd2b90527fff438fc4056b72a/subsys/bluetooth/host/id.c#L1519-L1528 . Unfortunately that will not happen, because there is an advertising running and Controller returns errorBT_HCI_ERR_CMD_DISALLOWED
for Host command:BT_HCI_OP_LE_SET_RANDOM_ADDRESS
.It looks to me that, to fix second issue we need to run Bluetooth address setup before advertising is resumed. Unfortunately that would require storage of some information that are provided to Host when advertising is started by
bt_le_adv_start
.
@ppryga-nordic could you please open a new issue for this point 2., and then close this one? This present one was about something else, and it is confusing to reuse it.
@carlescufi @jori-nordic New issue created: #52059.
will close this as #52059 has been opened as follow-up.
Describe the bug There is a specific case in which the Bluetooth stack seems to become stuck in a state where it will not accept/initiate any new connections. The following warning is printed in the console:
<wrn> bt_hci_core: opcode 0x2005 status 0x0c
I'm using an nRF52840 and NCS 2.1. The only way to get out of this state seems to be resetting the device. Below the steps I found which reproduces this state consistently using an example application.
To Reproduce
Impact This is not limited to the heartrate example. We initially discovered this bug in our own application and reproduced it using a sample so it would be easier to fix. Our users have to restart their gateways when configuring them to make sure that they still work properly. This feels like a pretty severe bug.
Logs and console output
Environment (please complete the following information):