zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.69k stars 6.53k forks source link

BLE scanning - BT RX thread hangs on. #49373

Closed fomst closed 1 year ago

fomst commented 2 years ago

Describe the bug During ble scan the BT RX thread hangs on and ble scan callback is not fired. This happens at random time after scan start.

To Reproduce Example: bluetooth/scan_adv west build -b nucleo_wb55rg .

scan parameters:

Expected behavior Scanning is going continously until manually stopped.

Impact Hard reset has to be done to begin scanning again.

Logs and console output As you can see below the number of cpu ticks of BT RX thread doesn't change for following threads analysis (time interval = 30s) aurora_issue_threads_info_cpu aurora_issue_threads_info

Environment (please complete the following information):

Additional context This issue occurs on our own hardware based on stm32wb55 mcu and on stm32wb55 nucleo board. There are different ble stacks used as mentioned in Environment section.

This issue occurs at random time (from my observations this time depends on scan parameters, when scan interval and window are bigger then it's harder to reproduce this issue).

When performing tests on scan_adv example i've commented the code part responsible for advertising. Only scanning is performed.

Also i tried to do some workaround and try to reset the bt stack when noticed that this issue occured, but there is another problem. When i do: (pseudocode:) bt_le_scan_disable(); hci_command->BT_HCI_OP_RESET bt_disable(); bt_enable();

I'm getting an assertion: assertion_bt_enable

Adding this line in hci_core.c bt_enable() resolves problem with assertion. k_work_queue_init(&bt_workq); assertion_bt_enable_resolv

carlescufi commented 2 years ago

@erwango @fomst could you please try to reproduce this with the Zephyr Link Layer using Nordic nRF hardware?

erwango commented 2 years ago

@erwango @fomst could you please try to reproduce this with the Zephyr Link Layer using Nordic nRF hardware?

@fmost, also can you give an idea about the occurence frequency ?

fomst commented 2 years ago

@carlescufi I'm not able to do that. I have'nt any nordic's hardware.

@erwango As i wrote this occures randomly, but most frequent this happens for low scan interval and scan window. For parameters passed in my first post this happens relatively often - after max few minutes.

erwango commented 2 years ago

@fomst Would you be able to provide a HCI trace using btmon ? This would speed up investigation

erwango commented 2 years ago

@fomst Would you be able to reproduce the same configuration on a STM32Cube based application ?

fomst commented 2 years ago

@fomst Would you be able to provide a HCI trace using btmon ? This would speed up investigation

@erwango Here are logs from bt_mon ble_logs.txt

@fomst Would you be able to reproduce the same configuration on a STM32Cube based application ?

Do you mean to reproduce this without zephyr environment? Just with ST's code base?

erwango commented 2 years ago

@erwango Here are logs from bt_mon ble_logs.txt

Great, thanks

@fomst Would you be able to reproduce the same configuration on a STM32Cube based application ?

Do you mean to reproduce this without zephyr environment? Just with ST's code base?

Indeed.

erwango commented 2 years ago

@fomst Can you confirm status of CONFIG_BT_RECV_BLOCKING ?

fomst commented 2 years ago

@erwango CONFIG_BT_RECV_BLOCKING=n as the BT_RECV_WORKQ_BT is selected.

About reproducing the issue with pure ST's ble stack - i'll try to do it asap, but i've other higher priority tasks to do, so it may take some time, until i'll be back with results.

erwango commented 2 years ago

@fomst Zephyr SDK 3.1.99 is current development branch, which is continuously updated. Can you give a specific SHA1 ?

erwango commented 2 years ago

@fomst I have been trying to reproduce the issue in the conditions you describe since several hours now using Zephyr (only unknown is the SHA1 you're using). To no avail so far.

fomst commented 2 years ago

@erwango Sorry for late answer. Somehow i've missed notification. Here is the SHA1 string: ae7f349367312dbf3f92e6aea028c0a89bb409a7.

Did you tried to reproduce this issue on the STM32 hardware, or on another one? I still work on reproducing this issue with ST's sdk, but had not much time. Hope i'll be back with results next week.

erwango commented 2 years ago

Did you tried to reproduce this issue on the STM32 hardware, or on another one?

Was using nucleo_wb55rg.

erwango commented 2 years ago

@fomst Have you been able to make some progress in you investigation ?

fomst commented 2 years ago

Hi @erwango,

Sorry for late answer, i had other stuff to do meantime. unfortunately no progress from my side. I've not been able to reproduce this issue on pure ST's stack. If i'll get anything new i'll inform you asap.

carlescufi commented 1 year ago

@jori-nordic @alwa-nordic maybe we could try to reproduce on nRF:

To Reproduce
Example: bluetooth/scan_adv
west build -b nucleo_wb55rg .

scan parameters:

BT_HCI_LE_SCAN_ACTIVE
BT_LE_SCAN_OPT_NONE
interval=128 (tried with other values too)
windows=64 (tried with other values too)
github-actions[bot] commented 1 year ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.