zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.86k stars 6.62k forks source link

BLE : ATT Timeout occurred during multilink central connection #30624

Closed mfinmuch closed 3 years ago

mfinmuch commented 3 years ago

Describe the bug Situation: Modify the example of zephyr BLE central_hr so that it can be connected to multiple peripherals. Each peripheral will subscribe to the central, and then the peripheral uses notifyto send the data to the central, and the central uses bt_gatt_writeto return the value to the peripheral. Now I use one central to 5 peripherals to test connection stability It will work normally at first, but after a period of time, central will suddenly get stuck and the following error message will pop up.

[00:00:00.258,422] <inf> bt_hci_core: Y123
[00:00:00.260,131] <inf> bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
[00:00:00.260,131] <inf> bt_hci_core: HW Variant: nRF52x (0x0002)
[00:00:00.260,162] <inf> bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 2.4 Build 0
[00:00:00.260,925] <inf> bt_hci_core: Identity: ec:e4:cd:dc:1b:f0 (random)
[00:00:00.260,955] <inf> bt_hci_core: HCI: version 5.2 (0x0b) revision 0x0000, manufacturer 0x05f1
[00:00:00.260,955] <inf> bt_hci_core: LMP: version 5.2 (0x0b) subver 0xffff
[00:02:11.060,974] <err> bt_att: ATT Timeout
[00:02:11.069,580] <err> bt_att: ATT Timeout
[00:02:11.071,502] <wrn> bt_att: No pending ATT request
[00:02:11.071,502] <err> os: ***** MPU FAULT *****
[00:02:11.071,502] <err> os:   Data Access Violation
[00:02:11.071,502] <err> os:   MMFAR Address: 0x4
[00:02:11.071,502] <err> os: r0/a1:  0x00006899  r1/a2:  0x00024ddb  r2/a3:  0x20fb52f0
[00:02:11.071,533] <err> os: r3/a4:  0x00000004 r12/ip:  0x20003adc r14/lr:  0x0000db19
[00:02:11.071,533] <err> os:  xpsr:  0x61000000
[00:02:11.071,533] <err> os: Faulting instruction address (r15/pc): 0x0002600a
[00:02:11.071,533] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
[00:02:11.071,533] <err> os: Current thread: 0x200013f0 (unknown)
[00:02:12.646,545] <err> os: Halting system

In addition, the time of each error is random Sometimes jump out <err> bt_gatt: Error sending ATT PDU: -57

And I made the following changes in prj.conf

CONFIG_BT=y
CONFIG_BT_DEBUG_LOG=y
CONFIG_BT_CENTRAL=y
CONFIG_BT_SMP=y
CONFIG_BT_GATT_CLIENT=y
CONFIG_GPIO=y
CONFIG_NRFX_SPIS0=y
CONFIG_BT_MAX_CONN=5
CONFIG_BT_PRIVACY=n
CONFIG_BT_DEVICE_NAME="zephyr"
CONFIG_BT_DEVICE_APPEARANCE=833
CONFIG_BT_DEVICE_NAME_DYNAMIC=y
CONFIG_BT_DEVICE_NAME_MAX=65
CONFIG_USERSPACE=y
CONFIG_HEAP_MEM_POOL_SIZE=4096
CONFIG_BT_RX_BUF_LEN=258
CONFIG_BT_ATT_TX_MAX=10
CONFIG_BT_ATT_PREPARE_COUNT=5
CONFIG_BT_CONN_TX_MAX=18
CONFIG_BT_L2CAP_TX_BUF_COUNT=18
CONFIG_BT_L2CAP_TX_MTU=247
CONFIG_BT_L2CAP_RX_MTU=247
CONFIG_BT_L2CAP_DYNAMIC_CHANNEL=y
CONFIG_BT_CTLR_PHY_2M=y
CONFIG_BT_CTLR_RX_BUFFERS=18
CONFIG_BT_CTLR_TX_BUFFERS=18
CONFIG_BT_CTLR_TX_BUFFER_SIZE=251
CONFIG_BT_CTLR_DATA_LENGTH_MAX=251
CONFIG_BT_CTLR_ADVANCED_FEATURES=y
CONFIG_BT_CTLR_XTAL_THRESHOLD=500000

Expected behavior At present, I want to allow central to communicate with 30 peripherals through BLE, so that the connection will continue to be stable. If the peripheral is disconnected, it will be reconnected, and there will be no bt_att: ATT Timeout error.

Impact I don’t know which side is the problem at the moment, I don’t know what to do, I’m very helpless.

Logs and console output Usually there are errors as shown below image Sometimes there will be more <err> bt_gatt: Error sending ATT PDU image

The link is my running video. The first few minutes were normal and met my expectations, but in the end it got stuck and an error occurred https://youtu.be/m5Ki--iXmXw

Environment (please complete the following information):

Additional context This is my central code, if you need to read it, you can download it, hope it helps C_N_w.zip

cvinayak commented 3 years ago

Please ensure that you use unique GATT parameter variables for each simultaneous connection. For instance, your cmd_read which is using the same single static struct bt_gatt_read_params read_params; across all active connections.

mfinmuch commented 3 years ago

Hello I'm not sure what you mean, I didn't use cmd_read, I did bt_gatt_writeon the conn when I received the notify from the peripheral This should not use the same.

mfinmuch commented 3 years ago

Please ensure that you use unique GATT parameter variables for each simultaneous connection. For instance, your cmd_read which is using the same single static struct bt_gatt_read_params read_params; across all active connections.

@cvinayak Hello Or should I modify my bt_gatt_write_paramsto be like this? static struct bt_gatt_write_params write_params[CONFIG_BT_MAX_CONN]; Then when writing, use this to manage? write_params[bt_conn_index(conn)]

cvinayak commented 3 years ago

yes, something like that, to avoid using the same memory for multiple simultaneous transactions.

mfinmuch commented 3 years ago

It's like it still didn't solve my problem In the end there was still the problem of ATT timeout error.

yes, something like that, to avoid using the same memory for multiple simultaneous transactions.

In this case, does discover_params also need to be set to static struct bt_gatt_discover_params discover_params[CONFIG_BT_MAX_CONN];? But after I changed it to this, the ATT timeout error still appears

So ATT timeout error, is it a problem that occurs when I use the same memory for multiple transactions at the same time?

cvinayak commented 3 years ago

is it a problem that occurs when I use the same memory for multiple transactions at the same time?

Yes.

If you have based your application out of an upstream sample. I suggest you send a PR with your changes to make the sample capable of multiple connections simultaneously. This will make it easier to review your changes.

mfinmuch commented 3 years ago

I suggest you send a PR with your changes to make the sample capable of multiple connections simultaneously.

how can I send a PR?I don’t know how to get my code up In addition, I found that I can improve my stability by changing these

#define MIN_CONNECTION_INTERVAL
#define MAX_CONNECTION_INTERVAL
#define SLAVE_LATENCY
#define SUPERVISION_TIMEOUT

Is there any reason for this?

mfinmuch commented 3 years ago

I pulled my code request but I’m not sure if I’m doing this right https://github.com/zephyrproject-rtos/zephyr/pull/30680#issue-538975971

cvinayak commented 3 years ago

Multiple connections are being established sequntially and gatt service discovery is performed using the same single instance of discover_params. This is bug in application and not in the Bluetooth Subsystem.

No guard preventing sequential multiple connections: https://github.com/zephyrproject-rtos/zephyr/pull/30680/files#diff-2fcfc31aeea68ed5719fe2686a5507a564ea9c85d0ca14216d6d6953eb7af5f8R452

Use of single instance of discover_params to perform GATT discovery: https://github.com/zephyrproject-rtos/zephyr/pull/30680/files#diff-2fcfc31aeea68ed5719fe2686a5507a564ea9c85d0ca14216d6d6953eb7af5f8R574

carlescufi commented 3 years ago

Resolving as issue in the user's code. Please reopen if you find that not to be the case.