zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.55k stars 6.46k forks source link

Bluetooth: bt_att: Unable to allocate buffer #67273

Closed tbarri closed 5 months ago

tbarri commented 8 months ago

Describe the bug

I seem to have run into a buffer exhaustion issue within the BLE stack.

To Reproduce

west build samples/bluetooth/peripheral_hr -b nrf52dk_nrf52832
west flash

Then, in a for loop:

After the 3rd iteration, the device refuses to respond to any further notifications:

*** Booting Zephyr OS build zephyr-v3.5.0-1688-gbd9c35b4966b ***
[00:00:00.255,340] <inf> bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
[00:00:00.255,371] <inf> bt_hci_core: HW Variant: nRF52x (0x0002)
[00:00:00.255,401] <inf> bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 3.5 Build 99
[00:00:00.256,134] <inf> bt_hci_core: Identity: D2:11:DE:BE:8F:2B (random)
[00:00:00.256,164] <inf> bt_hci_core: HCI: version 5.4 (0x0d) revision 0x0000, manufacturer 0x05f1
[00:00:00.256,195] <inf> bt_hci_core: LMP: version 5.4 (0x0d) subver 0xffff
Bluetooth initialized
Advertising successfully started
Connected
[00:00:10.444,274] <inf> bt_att: alloc rsp meta
<snip>
[00:00:10.804,412] <inf> bt_att: alloc rsp meta
[00:00:10.804,473] <inf> hrs: HRS notifications enabled
[00:00:10.849,670] <inf> bt_att: alloc rsp meta
[00:00:10.849,731] <inf> hrs: HRS notifications disabled
Disconnected (reason 0x13)
Connected
[00:00:11.816,711] <inf> bt_att: alloc rsp meta
<snip>
[00:00:12.176,818] <inf> bt_att: alloc rsp meta
[00:00:12.176,879] <inf> hrs: HRS notifications enabled
[00:00:12.222,106] <inf> bt_att: alloc rsp meta
[00:00:12.222,137] <inf> hrs: HRS notifications disabled
Disconnected (reason 0x13)
Connected
<snip>
[00:00:13.729,309] <inf> hrs: HRS notifications enabled
[00:00:13.774,536] <inf> bt_att: alloc rsp meta
[00:00:13.774,566] <inf> hrs: HRS notifications disabled
Disconnected (reason 0x13)
Connected
[00:00:44.516,632] <wrn> bt_conn: Unable to allocate buffer within timeout
[00:00:44.516,662] <err> bt_att: Unable to allocate buffer for op 0x03

The number of iterations required to hit the buffer issue seems to match the value of CONFIG_BT_ATT_TX_COUNT

I am only able to replicate this issue when using this BLE client: https://github.com/alexmoon/bluest.

I did perform a git bisect, and discovered that the issue is first introduced with commit bd9c35b4966bf56d908c3416ab2c192781f37ed8.

I repeated for the commit immediatly proceeding (530e845f927caa7c2f19aa172f2dbafbf1c731a7), and instead received exceptions. The device, however, remains operable.

*** Booting Zephyr OS build zephyr-v3.5.0-1687-g530e845f927c ***
[00:00:00.255,340] <inf> bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
[00:00:00.255,340] <inf> bt_hci_core: HW Variant: nRF52x (0x0002)
[00:00:00.255,371] <inf> bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 3.5 Build 99
[00:00:00.256,103] <inf> bt_hci_core: Identity: D2:11:DE:BE:8F:2B (random)
[00:00:00.256,134] <inf> bt_hci_core: HCI: version 5.4 (0x0d) revision 0x0000, manufacturer 0x05f1
[00:00:00.256,164] <inf> bt_hci_core: LMP: version 5.4 (0x0d) subver 0xffff
Bluetooth initialized
Advertising successfully started
Connected
[00:00:11.415,924] <inf> hrs: HRS notifications enabled
[00:00:11.461,181] <inf> hrs: HRS notifications disabled
[00:00:11.461,334] <err> bt_ctlr_hci: Invalid Tx Enqueue
[00:00:11.461,364] <err> bt_conn: Unable to send to driver (err -22)
Disconnected (reason 0x13)
Connected
[00:00:12.383,361] <inf> hrs: HRS notifications enabled
[00:00:12.428,649] <inf> hrs: HRS notifications disabled
Disconnected (reason 0x13)
Connected
[00:00:14.250,793] <inf> hrs: HRS notifications enabled
[00:00:14.296,051] <inf> hrs: HRS notifications disabled
[00:00:14.296,203] <err> bt_ctlr_hci: Invalid Tx Enqueue
[00:00:14.296,203] <err> bt_conn: Unable to send to driver (err -22)

Expected behavior A repeatedly disconnected BLE client should not be able to cause buffer exhaustion.

Impact I am not sure how impactful this issue is. It is not present in zephyr-v3.5.0.

Additional Notes

I realise that the issue was introduced with a revert, possibly introducing the same issue as https://github.com/zephyrproject-rtos/zephyr/pull/60309. I unfortunately lack too much context to know.

Many thanks in advance!

jori-nordic commented 8 months ago

I am only able to replicate this issue when using this BLE client

Can you reproduce it with the nrf connect app? (desktop or mobile).

If you can't we will need more info to debug this:

tbarri commented 8 months ago

Can you reproduce it with the nrf connect app? (desktop or mobile).

No, I was not able to reproduce the issue with the NRF connect app (I tried with both desktop + mobile).

I have, however, prepared an example repository that should hopefully allow you to reproduce the issue: https://github.com/tbarri/ble-zephyr-experiment

A packet trace is included in the packet-trace-data folder.

I used the peripheral_hr sample project. The .config for this can be found here.

Please let me know if you need any additional details.

theob-pro commented 8 months ago

Tried to reproduce locally but the rust program ran the 100 loops to completion without any errors.

Environment

ble-zephyr-experiment on  main is 📦 v0.1.0 via 🦀 v1.75.0 
❯ uname -r
6.5.0-14-generic

ble-zephyr-experiment on  main is 📦 v0.1.0 via 🦀 v1.75.0 
❯ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"

@jori-nordic and I used the hci_uart sample as the controller attached to BlueZ like so:

# monitor with btmon
sudo btmon &
# attach zephyr controller
sudo btattach -B /dev/ttyACM3 -S 1000000

We used the same commit that you mention in the example repo: 9569e4410475fb62125e769a13222882902d8c75

jori-nordic commented 7 months ago

@theob-pro could you try capturing the hci trace on a few working iterations and compare with the trace provided in the sample repo?

Could you also try with the latest zephyr revision?

github-actions[bot] commented 5 months ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

carlescufi commented 5 months ago

Closing due to inactivity. @tbarri please reopen if still applicable.