zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.45k stars 6.4k forks source link

Bluetooth: hci samples: Unable to allocate command buffer #37731

Open joerchan opened 3 years ago

joerchan commented 3 years ago

Describe the bug A few controller-side HCI drivers (see below) are allocating command buffers using K_NO_WAIT from interrupt context. This was done assuming that the Host to Controller flow control of commands (num complete in command complete / status) would limit the number of commands that the host can have queued up in the controller. However the Host Number of Completed packets (Controller to Host flow control) does not follow the regular flow control and can be sent from the host at any time. There is no command-complete / status event for this command.

In cases where the controller does not process the incoming commands in time before the command buffer pool is drained the command will be dropped by the controller side HCI driver.

Affected drivers are: HCI UART HCI RPMsg HCI SPI USB Bluetooth class

To Reproduce Reproduced by robustness test. Test was doing SMP pairing procedure immediately upon connection, causing multiple packets to be sent at the same time as control procedures are being run. Controller was delayed in processing of commands by ECDH HCI commands.

Expected behavior Controller HCI drivers should handle Host Number of Completed Packets outside of the regular flow control mechanism.

Impact Controller HCI driver drops HCI commands. If this is a Host Number of Completed Packets and will "leak" packet buffer count that it can send to the host. If this is any other HCI command that has flow control the Host side will not received HCI command complete / status and the Host will timeout waiting for the reply.

Logs and console output Host side:

ASSERTION FAIL [err == 0] @ WEST_TOPDIR/zephyr/subsys/bluetooth/host/hci_core.c:311
         k_sem_take failed with err -11

Controller side:

[00:00:03.625,122] <inf> hci_rpmsg: Received message of 9 bytes.
[00:00:03.625,244] <inf> hci_rpmsg: Received message of 9 bytes.
[00:00:03.625,305] <inf> hci_rpmsg: Received message of 68 bytes.
[00:00:03.625,335] <err> hci_rpmsg: No available command buffers!

Environment (please complete the following information):

Additional context Workaround: Increase HCI command buffer count CONFIG_BT_BUF_CMD_TX_COUNT from its default of 2 to 10. Workaround added here: https://github.com/zephyrproject-rtos/zephyr/pull/37714 and here: https://github.com/zephyrproject-rtos/zephyr/pull/37759 This would mask the error but is not a proper solution.

carlescufi commented 3 years ago

Potential solution:

Alternative:

alwa-nordic commented 1 year ago

I can confirm that this issue is present for hci_uart. The problem is that this sample is based on the 'interrupt driven' UART API, which does not allow partial readout of the UART.

One solution for UART is to switch over to the 'asynchronous' UART API. This will be done in a new sample: https://github.com/zephyrproject-rtos/zephyr/pull/59880

jori-nordic commented 7 months ago

IMO we should just bite the bullet and make a dedicates Host Number of Completed Packets command pool. It should be a good-enough stopgap measure until the general flow control issue is properly addressed.

github-actions[bot] commented 4 months ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

github-actions[bot] commented 2 months ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.