zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.87k stars 6.62k forks source link

bluetooth: GATT lockup on split packets #25561

Closed JordanYates closed 4 years ago

JordanYates commented 4 years ago

Describe the bug ATT data from a slave device that is split across multiple PHY packets appears to lockup something in the Zephyr master.

The issue occurs on the first packet that is split. In the current case, the Nth Handle Value Indication packet is sent by the Slave device, and the Zephyr master never responds with a value confirmation. This issue is 100% repeatable on the same packet each time a connection is made (7th indication currently).

A similar issue appears when I attempt discovery before increasing the connection data length through bt_conn_le_data_len_update. Two 128bit characteristics are present in a service, the Read By Type response is split across two packets. In this case the Bluetooth stack emits an ATT Timeout error. Again, this issue is consistent on every connection.

In both cases, the connection is not dropped, neither side crashes.

The slave device is based upon the nRF5 S140 v6.1.1 running FreeRTOS. Phone applications and other devices running the FreeRTOS stack can observe notifications indefinitely from the device, suggesting that the problem is not on that side.

In the zip file are two .pcapng files demonstrating both issues. BlockingGattPcapng.zip

Expected behavior Behavior should not be impacted by LL packets being split. Discovery should complete, Indications should continue indefinitely.

Impact What impact does this issue have on your progress (e.g., annoyance, showstopper)

Screenshots or console output

Console output for the discovery case:

*** Booting Zephyr OS build v2.3.0-rc1  ***
[00:00:00.020,782] <inf> fs_nvs: 8 Sectors of 4096 bytes
[00:00:00.020,782] <inf> fs_nvs: alloc wra: 1, be0
[00:00:00.020,812] <inf> fs_nvs: data wra: 1, 224
[00:00:00.031,829] <inf> bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
[00:00:00.031,860] <inf> bt_hci_core: HW Variant: nRF52x (0x0002)
[00:00:00.031,860] <inf> bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 2.3 Build 0
[00:00:00.032,226] <inf> bt_hci_core: Identity: c3:3b:7b:e8:c8:de (public)
[00:00:00.032,257] <inf> bt_hci_core: HCI: version 5.2 (0x0b) revision 0x0000, manufacturer 0x05f1
[00:00:00.032,257] <inf> bt_hci_core: LMP: version 5.2 (0x0b) subver 0xffff
[00:00:19.134,826] <inf> uc_bt_gatt: Connection created
[00:00:49.586,791] <err> bt_att: ATT Timeout

Environment (please complete the following information):

CONFIG_BT=y
CONFIG_BT_BROADCASTER=y
CONFIG_BT_OBSERVER=y
CONFIG_BT_PERIPHERAL=y
CONFIG_BT_CENTRAL=y
CONFIG_BT_GATT_CLIENT=y
CONFIG_BT_DEBUG_LOG=y
CONFIG_BT_MAX_CONN=2
CONFIG_BT_HCI_ACL_FLOW_CONTROL=y
CONFIG_BT_L2CAP_RX_MTU=247
CONFIG_BT_L2CAP_TX_MTU=247
CONFIG_BT_CTLR_TX_BUFFER_SIZE=251
CONFIG_BT_CTLR_DATA_LENGTH_MAX=251
CONFIG_BT_RX_BUF_LEN=258
CONFIG_BT_RX_STACK_SIZE=2048
CONFIG_BT_USER_DATA_LEN_UPDATE=y

This behavior was originally observed on Zephyr 2.2, and was not resolved after updating.

carlescufi commented 4 years ago

@JordanYates can you update the description with the exact sample you are using as a central?

Actually:

CONFIG_BT_USER_DATA_LEN_UPDATE=y

This is a fairly new option. I assume then you are not using a standard sample. Can you share either the code of your app or a minimal example to reproduce this?

JordanYates commented 4 years ago

CONFIG_BT_USER_DATA_LEN_UPDATE=y

Was only added so that I could increase the data length to avoid the discovery problem. Isolating the problem further, it appears to be caused by

CONFIG_BT_L2CAP_RX_MTU=247 CONFIG_BT_L2CAP_TX_MTU=247

More specifically, the largest L2CAP packet sent by the slave during discovery is 44 bytes. Discovery completes with no errors when EITHER of these configs are set to 43 or less, fails when BOTH are 44 or higher.

JordanYates commented 4 years ago

I have reproduced this issue between two Zephyr devices. Slave application has an RX and TX MTU of 240 and calls bt_gatt_exchange_mtu(). Master application has an RX and TX MTU of 45 in the failure case. Largest L2CAP packet is again 44 bytes. Discovery completes when the master MTU's are set to 44.

https://github.com/JordanYates/zephyr/tree/gatt_blocking/samples/bluetooth/blocking_discovery https://github.com/JordanYates/zephyr/tree/gatt_blocking/samples/bluetooth/blocking_slave

cvinayak commented 4 years ago

Please set CONFIG_BT_ACL_RX_COUNT=10 if you set CONFIG_BT_HCI_ACL_FLOW_CONTROL=y (and thinking aloud, hope that peer does not fragment 255 bytes in 1 byte PDUs, then we will need 255 buffers). This should allow for recombination.

But, why do you need CONFIG_BT_HCI_ACL_FLOW_CONTROL=y in a host+controller combined builds?

JordanYates commented 4 years ago

CONFIG_BT_L2CAP_RX_MTU and CONFIG_BT_L2CAP_TX_MTU both depend on CONFIG_BT_HCI_ACL_FLOW_CONTROL. CONFIG_BT_CTLR_RX_BUFFERS, which sets the default for CONFIG_BT_ACL_RX_COUNT, looks like what I need, I will give it a shot. As an aside, CONFIG_BT_CTLR_RX_BUFFERS is not mentioned anywhere in the documentation, or used in any of the sample applications. Maybe I've been barking up the wrong tree this whole time, but all I want to do is setup a GATT connection that can send 200 byte buffers back and forth in single packets. Have I missed some crucial KConfig settings that make this easy? Thanks for the input.

JordanYates commented 4 years ago

Presumably at some point the Bluetooth controller is trying to get a buffer and failing. Is it feasible to add some form of error output when it fails?

JordanYates commented 4 years ago

Beautiful, problem is resolved by increasing CONFIG_BT_CTLR_RX_BUFFERS.

cvinayak commented 4 years ago

CONFIG_BT_L2CAP_RX_MTU and CONFIG_BT_L2CAP_TX_MTU both depend on CONFIG_BT_HCI_ACL_FLOW_CONTROL.

You should not use CONFIG_BT_HCI_ACL_FLOW_CONTROL to enable setting CONFIG_BT_L2CAP_RX_MTU and CONFIG_BT_L2CAP_RX_MTU

CONFIG_BT_HCI_ACL_FLOW_CONTROL is intended for HCI controller or Host only builds.

Increasing CONFIG_BT_CTLR_RX_BUFFERS would unnecessarily double PDU RAM usage, 1 buffer in controller is sufficient to be recombined by use of CONFIG_BT_ACL_RX_COUNT buffers in host.

JordanYates commented 4 years ago

Alright, that makes sense. I based the original configuration on the NCS GATT throughput example, which uses Nordics proprietary controller. I am a bit lost in determining exactly which CONFIG's need to be enabled to increase the MTU and get the larger packets. As far as I can tell there are no Zephyr samples which do this, or at least none that call bt_gatt_exchange_mtu, which I believe is a required part of the setup process. The GATT documentation https://docs.zephyrproject.org/latest/reference/bluetooth/gatt.html is also not particularly helpful with respect to MTU's.

Is there some other reference that I am missing?

cvinayak commented 4 years ago

As this is no more a bug, I request you to close this issue and open a new enhancement request to have sample/documentation around use of larger packets.