Closed aolowin closed 2 years ago
@jori-nordic I will assume you will resume discussion based on the work you are doing here https://github.com/jori-nordic/zephyr/commit/2215bf0a945bc1f84ebb2a2c9b5b2d6754877b16
@jori-nordic Thank you for tracking this down. Is the fix in your branch likely to be the final solution or will there be other changes before it goes into the main branch?
I was on vacation, sorry for the delay.
@aolowin the fix I have might have to change a bit, because while the bug still exists in latest upstream, it has different symptoms due to the l2cap work that has been done.
@hermabe has a PR (https://github.com/zephyrproject-rtos/zephyr/pull/45682/files) to fix some issues that were introduced there, and I'll be trying to reproduce this particular bug today with his fixes applied locally.
EDIT: so I haven't been able to reproduce the issue with herman's PR applied locally. I think we can close this when that PR is merged. Could you try to reproduce it after that PR is merged @aolowin ?
@jori-nordic I'll try and reproduce it after that PR is in. Thanks.
@jori-nordic I'll try and reproduce it after that PR is in. Thanks.
@aolowin would you mind trying to reproduce the issue with the PR applied before it's merged? This way we would know that this is indeed a fix for this issue.
You can check out this branch: https://github.com/hermabe/zephyr/tree/fix/meta_free from the PR.
I've tested https://github.com/zephyrproject-rtos/zephyr/pull/45682 and it does seem to fix the issue. By keeping the peripheral at the edge of the RF range I was able to force numerous disconnect/reconnect cycles and the peripheral was always able to recover.
There were a few warnings:
Connected
[00:09:34.331,970] <inf> hrs: HRS notifications enabled
Disconnected (reason 0x08)
[00:09:38.373,413] <wrn> bt_att: Unable to allocate ATT TX meta
[00:09:38.373,443] <wrn> bt_gatt: No buffer available to send notification
[00:09:38.482,421] <inf> hrs: HRS notifications disabled
Connected
[00:09:40.669,647] <inf> hrs: HRS notifications enabled
but no errors.
I've tested #45682 and it does seem to fix the issue. By keeping the peripheral at the edge of the RF range I was able to force numerous disconnect/reconnect cycles and the peripheral was always able to recover.
There were a few warnings:
Connected [00:09:34.331,970] <inf> hrs: HRS notifications enabled Disconnected (reason 0x08) [00:09:38.373,413] <wrn> bt_att: Unable to allocate ATT TX meta [00:09:38.373,443] <wrn> bt_gatt: No buffer available to send notification [00:09:38.482,421] <inf> hrs: HRS notifications disabled Connected [00:09:40.669,647] <inf> hrs: HRS notifications enabled
but no errors.
Thanks for testing. Could you make sure that these warnings do not prevent the stack from sending data once buffers are available again? i.e. this is a recoverable warning that doesn't require rebooting any of the Zephyr-based devices.
There were a few warnings
I think these warnings are fine. The buffer itself is freed after it is sent to the controller, but the metadata is not freed until the callbacks are called after receiving the num_complete event from the controller. The allocation of the buffer blocks, but the allocation of the metadata does not, so if no metadata could be allocated the warning is printed and -ENOMEM
is returned by the GATT API functions.
I can confirm that the central_hr was able to receive notifications from the peripheral_hr once connected - regardless of whether the warnings occurred. No reboots required.
@jori-nordic @hermabe any chance of the fixes for this issue being backported to 2.7? https://github.com/zephyrproject-rtos/zephyr/pull/46165 https://github.com/zephyrproject-rtos/zephyr/pull/48395
Describe the bug A BLE peripheral can sometimes get into a state where it will no longer send notifications. It seems to occur if connections are repeatedly dropped due to range or antenna issues. It may happen if the connection is lost during discovery but that's a guess.
The following errors are generated: [00:02:21.702,667] bt_conn: Disconnected while allocating context
[00:02:29.443,542] bt_conn: Unable to allocate buffer within timeout
[00:02:29.443,572] bt_l2cap: Unable to allocate buffer for op 0x12
[00:02:52.495,300] bt_conn: Unable to allocate buffer within timeout
[00:02:52.495,300] bt_att: Unable to allocate buffer for op 0x07
Once it gets into this state only a reboot will fix it. A reset on the central side has no effect.
To Reproduce Use the central_hr and peripheral_hr sample apps on the nrf52840dk_nrf52840 boards west build samples/bluetooth/peripheral_hr --build-dir=./build/peripheral_hr -b nrf52840dk_nrf52840 west build samples/bluetooth/central_hr --build-dir=./build/central_hr -b nrf52840dk_nrf52840
For ease of desktop testing it's convenient to use: CONFIG_BT_CTLR_TX_PWR_MINUS_40=y on the central_hr device. This allows a disconnect by moving the boards a short distance apart.
Move the boards closer and farther from each other to trigger disconnect/reconnect events and eventually generate the erroneous state.
Expected behavior A peripheral can cleanly reconnect after a disconnect.
Impact Serious impact since the peripheral will be unable to communicate without a reboot.
Logs and console output central_hr:
peripheral_hr:
Environment (please complete the following information):