zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.6k stars 6.5k forks source link

Bluetooth: Controller: Restarting BLE scanning not always working and sometimes crashes together with periodic. adv. #50432

Closed jakkra closed 2 years ago

jakkra commented 2 years ago

Describe the bug There are 3 sub issues, all related to the same scenario hence I made one ticket. Write if you want it split.

  1. Restarting scanning with bt_le_scan_stop followed by bt_le_scan_start sometimes fails. bt_le_scan_start returns success, but scanning either not started or just a few events are received before it stops reporting anything from controller (maybe similar to https://github.com/zephyrproject-rtos/zephyr/issues/49915). Doing an additional disable/enable when this happens makes it working again.
  2. There is a crash in Scan disable...ASSERTION FAIL [!aux->parent] @ WEST_TOPDIR/zephyr/subsys/bluetooth/controller/ll_sw/ull_scan.c:647 sometimes when calling bt_le_scan_stop
  3. Sometimes bt_le_scan_stop returns -5 (I/O error) in the sample application, I don't think it should.

    • What target platform are you using? nRF52833, Zephyr
    • Is this a regression? Not tested this scenario before.

To Reproduce Have 10+ tags sending per. adv. with 7.5ms interval and CTE enabled. Build and flash the application below. Then wait...

Application: aoa_receiver_multiple-gh-50432.zip

Relevant configs (see attached application for more info):

CONFIG_BT_CTLR_ADVANCED_FEATURES=y
CONFIG_BT_CTLR_SCAN_AUX_SET=2

CONFIG_BT_CTLR_SYNC_PERIODIC_SKIP_ON_SCAN_AUX=y

CONFIG_BT_CTLR_SCAN_AUX_SYNC_RESERVE_MIN=n

CONFIG_BT_TICKER_LOW_LAT=n
CONFIG_BT_TICKER_PRIORITY_SET=n
CONFIG_BT_CTLR_SCAN_UNRESERVED=n # Scan only when radio is free.

Expected behavior Scanning enable always works and no crash when scanning is turned off.

Impact An application cannot know if there are no devices in the area or if bt_le_scan_start did not work.

Logs and console output

  1. Log for scan enable not working:
    ...
    Queued: 7 Synced: 4 Num err: 0/62
    [00:00:03.413,879] <wrn> main: RESTARTING SCAN AND SYNC...
    Scan disable...Success.
    Start scanning...success
    [00:00:03.617,706] <wrn> main: RESTARTED!
    Scan is running...
    Running...
    TAGS:sync...,26,sync...,sync...,sync...,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
    Queued: 7 Synced: 1 Num err: 0/26
    [00:00:04.618,286] <wrn> main: RESTARTING SCAN AND SYNC...
    Scan disable...Success.
    Scan is running...
    Start scanning...success
    [00:00:04.821,289] <wrn> main: RESTARTED!
    Running...
    TAGS:sync...,sync...,sync...,sync...,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
    Queued: 7 Synced: 0 Num err: 0/0
    Running...
    TAGS:sync...,sync...,sync...,sync...,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
    Queued: 7 Synced: 0 Num err: 0/0
    ... Nothing happens
  2. Log for the crash:
    ...
    Queued: 11 Synced: 6 Num err: 0/79
    Running...
    TAGS:18,sync...,sync...,0,20,sync...,12,19,20,sync...,20,sync...,12,14,11,sync...,sync...,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
    Queued: 10 Synced: 10 Num err: 0/146
    [00:01:33.819,915] <wrn> main: RESTARTING SCAN AND SYNC...
    Scan disable...Success.
    Start scanning...success
    [00:01:34.024,810] <wrn> main: RESTARTED!
    Scan is running...
    Running...
    TAGS:sync...,sync...,sync...,sync...,28,sync...,sync...,sync...,1,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
    Queued: 9 Synced: 2 Num err: 0/29
    Scan is running...
    Running...
    TAGS:9,18,sync...,7,54,sync...,20,sync...,13,sync...,sync...,1,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
    Queued: 6 Synced: 7 Num err: 0/122
    Running...
    TAGS:16,16,14,10,17,sync...,15,10,8,sync...,1,16,2,sync...,sync...,4,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
    Queued: 5 Synced: 12 Num err: 0/131
    [00:01:37.027,923] <wrn> main: RESTARTING SCAN AND SYNC...
    Scan disable...ASSERTION FAIL [!aux->parent] @ WEST_TOPDIR/zephyr/subsys/bluetooth/controller/ll_sw/ull_scan.c:647
    [00:01:37.028,411] <err> os: r0/a1:  0x00000003  r1/a2:  0x00000005  r2/a3:  0x00000004
    [00:01:37.028,411] <err> os: r3/a4:  0x00000287 r12/ip:  0x200027ac r14/lr:  0x0000da55
    [00:01:37.028,442] <err> os:  xpsr:  0x41000000
    [00:01:37.028,442] <err> os: Faulting instruction address (r15/pc): 0x0000da60
    [00:01:37.028,472] <err> os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
    [00:01:37.028,533] <err> os: Current thread: 0x20001c40 (BT TX)
    [00:01:37.107,849] <err> os: Halting system

Environment (please complete the following information):

cvinayak commented 2 years ago

@jakkra Please check if #50454 fixes assertion.

jakkra commented 2 years ago

@cvinayak I'm pretty sure https://github.com/zephyrproject-rtos/zephyr/pull/50454 fixed the crash, been running for 30 min now without the crash.

cvinayak commented 2 years ago

@jakkra Thank you, will proceed to get #50454 merged upstream.

Observations:

CONFIG_BT_CTLR_SCAN_UNRESERVED=n # Scan only when radio is free.

and

scanning either not started or just a few events are received before it stops reporting anything from controller

You are having 10+ tags at 7.5ms interval, this is very stressful scenario with overlapping radio events across and every 7.5 ms duration in timeline, there may not be sufficient number of times the scan window getting to use the radio for reception, also thereafter the reception of the auxiliary PDU would overlap with existing periodic sync radio events.

If the periodic advertising interval where 100 ms, does the stall in receiving the advertising reports as evident?

jakkra commented 2 years ago

@cvinayak

jakkra commented 2 years ago

Here is one log with CONFIG_BT_CTLR_SCAN_UNRESERVED=y. In this log behaviour is a bit different, here you can see the Scan is running... print, which means I get scan responses, but no tags manage to sync at all, and when I power off tags you can see all getting terminated, but it does not recover afterwards. Also note no Scan is running... at the end.

I have another log, same case as above but after I reset the tags and powered them up again everything recovered.

Edit: got the same with CONFIG_BT_CTLR_SCAN_UNRESERVED=n so probably not that makes a difference.

See my comments within >>>>> comment <<<<<<< in below log

Running...
TAGS:sync...,sync...,0,sync...,1,0,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 6 Synced: 3 Num err: 0/1
[00:03:08.847,656] <wrn> main: RESTARTING SCAN AND SYNC...
Scan disable...Success.
Start scanning...success
[00:03:09.050,292] <wrn> main: RESTARTED!
Scan is running...
Running...
TAGS:sync...,sync...,sync...,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 6 Synced: 0 Num err: 0/0
Scan is running...
Running...
TAGS:sync...,sync...,sync...,sync...,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 7 Synced: 0 Num err: 0/0
Running...
TAGS:sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 9 Synced: 0 Num err: 0/0
Scan is running...
Running...
TAGS:sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 9 Synced: 0 Num err: 0/0
Scan is running...
Running...
TAGS:sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 9 Synced: 0 Num err: 0/0
Scan is running...
Running...
TAGS:sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 11 Synced: 0 Num err: 0/0
Running...
TAGS:sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 11 Synced: 0 Num err: 0/0
Scan is running...
Running...
TAGS:sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 11 Synced: 0 Num err: 0/0
Scan is running...
Running...
TAGS:sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
>>>>> Continues for a minute or so <<<<<<<
Queued: 11 Synced: 0 Num err: 0/0
Running...
TAGS:sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 11 Synced: 0 Num err: 0/0
Scan is running...
Running...
TAGS:sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,

>>>>>>>>>> Here I powered off all tags and below happend <<<<<<<<<
Queued: 11 Synced: 0 Num err: 0/0
(251530) PER_ADV_SYNC[1]: [DEVICE]: 33:37:05:68:90:83 (random) sync terminated reason: 31
[00:04:11.530,364] <wrn> main: (251530)___Device not synced cancel sync tmo

[00:04:11.530,364] <wrn> main:
                               info->reason: 31

[00:04:11.530,395] <inf> main: Creating Sync for tags in list
(251730) PER_ADV_SYNC[22]: [DEVICE]: 33:37:05:68:90:83 (random) sync terminated reason: 31
[00:04:11.730,346] <wrn> main: (251730)___Device not synced cancel sync tmo

[00:04:11.730,346] <wrn> main:
                               info->reason: 31

[00:04:11.730,377] <inf> main: Creating Sync for tags in list
[00:04:11.730,407] <wrn> main: Per syncing already enabled!
(251742) PER_ADV_SYNC[17]: [DEVICE]: 29:3C:C2:87:D7:22 (random) sync terminated reason: 31
[00:04:11.742,706] <wrn> main: (251742)___Device not synced cancel sync tmo

[00:04:11.742,736] <wrn> main:
                               info->reason: 31

[00:04:11.742,767] <inf> main: Creating Sync for tags in list
[00:04:11.742,767] <wrn> main: Per syncing already enabled!
(251808) PER_ADV_SYNC[6]: [DEVICE]: 12:3E:93:75:55:72 (random) sync terminated reason: 31
[00:04:11.808,349] <wrn> main: (251808)___Device not synced cancel sync tmo

[00:04:11.808,380] <wrn> main:
                               info->reason: 31

[00:04:11.808,380] <inf> main: Creating Sync for tags in list
[00:04:11.808,410] <wrn> main: Per syncing already enabled!
(251830) PER_ADV_SYNC[7]: [DEVICE]: 33:37:05:68:90:83 (random) sync terminated reason: 31
[00:04:11.830,352] <wrn> main: (251830)___Device not synced cancel sync tmo

--- 2 messages dropped ---
[00:04:11.842,651] <inf> main: Creating Sync for tags in list
--- 4 messages dropped ---
[00:04:11.842,681] <wrn> main: Per syncing already enabled!
(251866) PER_ADV_SYNC[12]: [DEVICE]: 0E:63:CC:6E:05:39 (random) sync terminated reason: 31
--- 11 messages dropped ---
[00:04:11.890,380] <wrn> main:
                               info->reason: 31

--- 5 messages dropped ---
[00:04:11.901,702] <inf> main: Creating Sync for tags in list
[00:04:11.901,733] <wrn> main: Per syncing already enabled!
(251908) PER_ADV_SYNC[20]: [DEVICE]: 12:3E:93:75:55:72 (random) sync terminated reason: 31
[00:04:11.908,386] <wrn> main: (251908)___Device not synced cancel sync tmo

[00:04:11.908,416] <wrn> main:
                               info->reason: 31

[00:04:11.908,416] <inf> main: Creating Sync for tags in list
[00:04:11.908,447] <wrn> main: Per syncing already enabled!
--- 1 messages dropped ---
[00:04:11.930,358] <wrn> main: (251930)___Device not synced cancel sync tmo

[00:04:11.930,389] <wrn> main:
                               info->reason: 31

[00:04:11.930,419] <inf> main: Creating Sync for tags in list
[00:04:11.930,419] <wrn> main: Per syncing already enabled!
(251935) PER_ADV_SYNC[11]: [DEVICE]: 1C:33:3E:49:20:A6 (random) sync terminated reason: 31
--- 3 messages dropped ---
[00:04:11.935,302] <wrn> main: Per syncing already enabled!
(251943) PER_ADV_SYNC[2]: [DEVICE]: 3A:C3:4F:63:21:26 (random) sync terminated reason: 31
[00:04:11.943,298] <wrn> main: (251943)___Device not synced cancel sync tmo

[00:04:11.943,298] <wrn> main:
                               info->reason: 31

[00:04:11.943,328] <inf> main: Creating Sync for tags in list
[00:04:11.943,359] <wrn> main: Per syncing already enabled!
(251990) PER_ADV_SYNC[9]: [DEVICE]: 39:16:79:37:2C:8E (random) sync terminated reason: 31
--- 10 messages dropped ---
[00:04:12.042,572] <inf> main: Creating Sync for tags in list
--- 2 messages dropped ---
[00:04:12.042,572] <wrn> main: Per syncing already enabled!
(252090) PER_ADV_SYNC[10]: [DEVICE]: 39:16:79:37:2C:8E (random) sync terminated reason: 31
[00:04:12.090,576] <wrn> main: (252090)___Device not synced cancel sync tmo

[00:04:12.090,576] <wrn> main:
                               info->reason: 31

--- 2 messages dropped ---
Running...
TAGS:sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 11 Synced: 0 Num err: 0/0
(252102) PER_ADV_SYNC[5]: [DEVICE]: 1E:BF:31:F3:02:90 (random) sync terminated reason: 31
[00:04:12.102,203] <wrn> main: (252102)___Device not synced cancel sync tmo

[00:04:12.102,233] <wrn> main:
                               info->reason: 31

[00:04:12.102,233] <inf> main: Creating Sync for tags in list
[00:04:12.102,264] <wrn> main: Per syncing already enabled!
(252142) PER_ADV_SYNC[4]: [DEVICE]: 3F:E8:2C:69:54:A5 (random) sync terminated reason: 31
[00:04:12.142,944] <wrn> main: (252142)___Device not synced cancel sync tmo

[00:04:12.142,974] <wrn> main:
                               info->reason: 31

[00:04:12.143,005] <inf> main: Creating Sync for tags in list
[00:04:12.143,035] <wrn> main: Per syncing already enabled!
(252242) PER_ADV_SYNC[16]: [DEVICE]: 3F:E8:2C:69:54:A5 (random) sync terminated reason: 31
[00:04:12.242,980] <wrn> main: (252242)___Device not synced cancel sync tmo

[00:04:12.242,980] <wrn> main:
                               info->reason: 31

[00:04:12.243,011] <inf> main: Creating Sync for tags in list
[00:04:12.243,011] <wrn> main: Per syncing already enabled!
(252342) PER_ADV_SYNC[19]: [DEVICE]: 29:3C:C2:87:D7:22 (random) sync terminated reason: 31
[00:04:12.342,498] <wrn> main: (252342)___Device not synced cancel sync tmo

[00:04:12.342,529] <wrn> main:
                               info->reason: 31

[00:04:12.342,559] <inf> main: Creating Sync for tags in list
[00:04:12.342,559] <wrn> main: Per syncing already enabled!
Running...
TAGS:sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 11 Synced: 0 Num err: 0/0
Running...
>>>>>>>>>> Here I powered ON all tags again but nothing happend <<<<<<<<<
TAGS:sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 11 Synced: 0 Num err: 0/0
Running...
TAGS:sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 11 Synced: 0 Num err: 0/0
Running...
TAGS:sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 11 Synced: 0 Num err: 0/0
Running...
TAGS:sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
.....
>>>> Continues forever, I think scanning got disabled <<<<<
cvinayak commented 2 years ago

@jakkra ok, I will do more testing using your attached sample to try to get the same behavior as you see.

If you are using CONFIG_BT_CTLR_SCAN_UNRESERVED=y, you are using continuous scanning (same interval and window), right? (Otherwise, this option will make no difference).

jakkra commented 2 years ago

@jakkra ok, I will do more testing using your attached sample to try to get the same behavior as you see.

If you are using CONFIG_BT_CTLR_SCAN_UNRESERVED=y, you are using continuous scanning (same interval and window), right? (Otherwise, this option will make no difference).

Great, thank you! Yes in the attached sample I use:

        .interval   = 0x10,
        .window     = 0x10,
jakkra commented 2 years ago

Two tags only, 50ms interval:

CONFIG_BT_TICKER_LOW_LAT=n
CONFIG_BT_TICKER_PRIORITY_SET=n
CONFIG_BT_CTLR_SCAN_UNRESERVED=n
...
TAGS:19,19,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 0 Synced: 2 Num err: 0/38
[00:00:18.860,321] <wrn> main: RESTARTING SCAN AND SYNC...
Scan disable...Success.
Start scanning...success
[00:00:19.063,568] <wrn> main: RESTARTED!
(19320) PER_ADV_SYNC[2]: [DEVICE]: 1A:AD:EA:17:91:83 (random) synced, Interval 0x0028 (50 ms), PHY LE 2M
[00:00:19.320,892] <inf> main: Creating Sync for tags in list
Enable receiving of CTE...
(19321) success. CTE receive enabled.
(19553) PER_ADV_SYNC[0]: [DEVICE]: 23:BC:90:96:61:E5 (random) synced, Interval 0x0028 (50 ms), PHY LE 2M
[00:00:19.553,649] <inf> main: Creating Sync for tags in list
Enable receiving of CTE...
(19554) success. CTE receive enabled.
Running...
TAGS:12,11,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 0 Synced: 2 Num err: 0/23
[00:00:20.064,086] <wrn> main: RESTARTING SCAN AND SYNC...
Scan disable...Success.
Start scanning...success
[00:00:20.267,486] <wrn> main: RESTARTED!
Running...
TAGS:-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 0 Synced: 0 Num err: 0/0
Running...
TAGS:-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 0 Synced: 0 Num err: 0/0
Running...
TAGS:-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 0 Synced: 0 Num err: 0/0
Running...
.... continues forever even if I restart tags.
cvinayak commented 2 years ago

Thank you, as always I need to get this reproduced at my side, hope I get to identify the issue today.

cvinayak commented 2 years ago

I was able to reproduce the below (please confirm if this is what you too see?):

[2022-09-23 09:44:15.366] TAGS:3,4,3,3,1,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
[2022-09-23 09:44:15.375] Queued: 1 Synced: 5 Num err: 0/14
[2022-09-23 09:44:15.380] [00:18:25.953,094] <wrn> main: RESTARTING SCAN AND SYNC...
[2022-09-23 09:44:15.386] Scan disable...Success.
[2022-09-23 09:44:15.388] Start scanning...success
[2022-09-23 09:44:15.391] [00:18:26.157,287] <wrn> main: RESTARTED!
[2022-09-23 09:44:16.569] Running...
[2022-09-23 09:44:16.571] TAGS:3,0,sync...,3,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
[2022-09-23 09:44:16.583] Queued: 2 Synced: 3 Num err: 0/6
[2022-09-23 09:44:16.584] [00:18:27.158,081] <wrn> main: RESTARTING SCAN AND SYNC...
[2022-09-23 09:44:16.591] Scan disable...Success.
[2022-09-23 09:44:16.593] Start scanning...success
[2022-09-23 09:44:16.595] [00:18:27.363,525] <wrn> main: RESTARTED!
[2022-09-23 09:44:17.776] Running...
[2022-09-23 09:44:17.778] TAGS:sync...,sync...,sync...,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
[2022-09-23 09:44:17.791] Queued: 6 Synced: 0 Num err: 0/0
[2022-09-23 09:44:17.794] Running...
[2022-09-23 09:44:17.796] TAGS:sync...,sync...,sync...,sync...,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
[2022-09-23 09:44:17.849] Queued: 7 Synced: 0 Num err: 0/0
[2022-09-23 09:44:19.780] Running...
[2022-09-23 09:44:19.782] TAGS:sync...,sync...,sync...,sync...,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
[2022-09-23 09:44:19.792] Queued: 7 Synced: 0 Num err: 0/0
[2022-09-23 09:44:19.812] Running...
[2022-09-23 09:44:19.813] TAGS:sync...,sync...,sync...,sync...,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
[2022-09-23 09:44:19.814] Queued: 7 Synced: 0 Num err: 0/0
[2022-09-23 09:44:21.779] Running...
[2022-09-23 09:44:21.791] TAGS:sync...,sync...,sync...,sync...,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
[2022-09-23 09:44:21.795] Queued: 7 Synced: 0 Num err: 0/0
[2022-09-23 09:44:21.797] Running...
[2022-09-23 09:44:21.799] TAGS:sync...,sync...,sync...,sync...,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
[2022-09-23 09:44:21.823] Queued: 7 Synced: 0 Num err: 0/0
[2022-09-23 09:44:23.780] Running...
[2022-09-23 09:44:23.781] TAGS:sync...,sync...,sync...,sync...,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
[2022-09-23 09:44:23.794] Queued: 7 Synced: 0 Num err: 0/0
[2022-09-23 09:44:23.798] Running...
[2022-09-23 09:44:23.799] TAGS:sync...,sync...,sync...,sync...,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
[2022-09-23 09:44:23.817] Queued: 7 Synced: 0 Num err: 0/0
[2022-09-23 09:44:25.782] Running...
[2022-09-23 09:44:25.784] TAGS:sync...,sync...,sync...,sync...,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
[2022-09-23 09:44:25.794] Queued: 7 Synced: 0 Num err: 0/0
[2022-09-23 09:44:25.798] Scan is running...
[2022-09-23 09:44:25.820] Running...

Then I used #50569 to check if there is any change, and I got an assertion from your sample:

[2022-09-23 10:04:38.016] [00:12:40.191,223] <wrn> main: RESTARTED!
[2022-09-23 10:04:39.178] Running...
[2022-09-23 10:04:39.178] TAGS:3,6,69,6,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
[2022-09-23 10:04:39.208] Queued: 1 Synced: 4 Num err: 0/84
[2022-09-23 10:04:39.208] [00:12:41.192,504] <wrn> main: RESTARTING SCAN AND SYNC...
[2022-09-23 10:04:39.209] Scan disable...Success.
[2022-09-23 10:04:39.210] Start scanning...success
[2022-09-23 10:04:39.210] [00:12:41.396,331] <wrn> main: RESTARTED!
[2022-09-23 10:04:39.503] Running...
[2022-09-23 10:04:39.505] TAGS:0,sync...,sync...,sync...,sync...,0,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
[2022-09-23 10:04:39.518] Queued: 4 Synced: 2 Num err: 0/0
[2022-09-23 10:04:39.519] [00:12:42.397,003] <wrn> main: RESTARTING SCAN AND SYNC...
[2022-09-23 10:04:39.544] Scan disable...Success.
[2022-09-23 10:04:39.544] [00:12:42.399,383] <wrn> bt_hci_core: opcode 0x2048 status 0x42
[2022-09-23 10:04:39.545] [00:12:42.399,414] <err> bt_scan: Failed to remove device from periodic advertiser list
[2022-09-23 10:04:39.545] [00:12:42.399,444] <err> main: Expected tag to be in controller adv_list. Is there a race?
[2022-09-23 10:04:39.554] [00:12:42.473,876] <err> main: Got cte_recv_cb from none queued tag!
[2022-09-23 10:04:39.562] [00:12:42.473,907] <wrn> main: Per syncing already enabled!
[2022-09-23 10:04:39.611] [00:12:42.518,890] <err> main: Got cte_recv_cb from none queued tag!
[2022-09-23 10:04:39.611] [00:12:42.518,920] <wrn> main: Per syncing already enabled!
[2022-09-23 10:04:39.759] [00:12:42.601,379] <err> main: Got cte_recv_cb from none queued tag!
[2022-09-23 10:04:39.773] [00:12:42.601,409] <wrn> main: Per syncing already enabled!
[2022-09-23 10:04:39.773] Start scanning...success
[2022-09-23 10:04:39.775] [00:12:42.603,637] <wrn> main: RESTARTED!
[2022-09-23 10:04:39.781] [00:12:42.653,869] <err> main: Got cte_recv_cb from none queued tag!
[2022-09-23 10:04:39.792] [00:12:42.653,900] <wrn> main: Per syncing already enabled!
[2022-09-23 10:04:39.844] [00:12:42.721,374] <err> main: Got cte_recv_cb from none queued tag!
[2022-09-23 10:04:39.845] [00:12:42.721,405] <wrn> main: Per syncing already enabled!
[2022-09-23 10:04:39.845] [00:12:42.773,864] <err> main: Got cte_recv_cb from none queued tag!
[2022-09-23 10:04:39.845] [00:12:42.773,895] <wr!
[2022-09-23 10:04:40.230] [00:12:42.908,966] <err> main: Got cte_recv_cb from none queued tag!
[2022-09-23 10:04:40.239] [00:12:42.908,996] <wrn> main: Per syncing already enabled!
[2022-09-23 10:04:40.246] [00:12:42.983,917] <err> main: Got cte_recv_cb from none queued tag!
[2022-09-23 10:04:40.257] [00:12:42.983,947] <wrn> main: Per syncing already enabled!
[2022-09-23 10:04:40.262] [00:12:43.051,452] <err> main: Got cte_recv_cb from none queued tag!
[2022-09-23 10:04:40.272] [00:12:43.051,483] <wrn> main: Per syncing already enabled!
[2022-09-23 10:04:40.304] [00:12:43.111,480] <err> main: Got cte_recv_cb from none queued tag!
[2022-09-23 10:04:40.304] [00:12:43.111,480] <wrn> main: Per syncing already enabled!
[2022-09-23 10:04:40.304] [00:12:43.178,955] <err> main: Got cte_recv_cb from none queued tag!
[2022-09-23 10:04:40.306] [00:12:43.178,985] <wrn> main: Per syncing already enabled!
[2022-09-23 10:04:40.314] [00:12:43.246,459] <err> main: Got cte_recv_cb from none queued tag!
[2022-09-23 10:04:40.350] [00:12:43.246,520] <wrn> main: Per syncing already enabled!
[2022-09-23 10:04:40.351] [00:12:43.313,964] <err> main: Got cte_recv_cb from none queued tag!
[2022-09-23 10:04:40.352] [00:12:43.313,964] <wrn> main: Per syncing already enabled!
[2022-09-23 10:04:40.592] Scan is running...
[2022-09-23 10:04:40.593] [00:12:43.396,453] <err> main: Got cte_recv_cb from none queued tag!
[2022-09-23 10:04:40.602] [00:12:43.396,484] <wrn> main: Per syncing already enabled!
[2022-09-23 10:04:40.607] [00:12:43.463,958] <err> main: Got cte_recv_cb from none queued tag!
[2022-09-23 10:04:40.624] [00:12:43.463,989] <wrn> main: Per syncing already enabled!
[2022-09-23 10:04:40.625] [00:12:43.531,463] <err> main: Got cte_recv_cb from none queued tag!
[2022-09-23 10:04:40.630] [00:12:43.531,463] <wrn> main: Per syncing already enabled!
[2022-09-23 10:04:40.638] Running...
[2022-09-23 10:04:40.639] TAGS:sync...,84,11,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
[2022-09-23 10:04:40.695] Queued: 3 Synced: 2 Num err: 0/112
[2022-09-23 10:04:40.695] [00:12:43.604,339] <wrn> main: RESTARTING SCAN AND SYNC...
[2022-09-23 10:04:40.695] Scan disable...Success.
[2022-09-23 10:04:40.695] [00:12:43.606,323] <err> main: Got cte_recv_cb from none queued tag!
[2022-09-23 10:04:40.696] [00:1243.696,319] <err> main: Got cte_recv_cb from none queued tag!
[2022-09-23 10:04:40.701] [00:12:43.696,350] <wrn> main: Per syncing already enabled!
[2022-09-23 10:04:40.920] [00:12:43.741,302] <err> main: Got cte_recv_cb from none queued tag!
[2022-09-23 10:04:40.923] [00:12:43.741,333] <wrn> main: Per syncing already enabled!
[2022-09-23 10:04:40.924] [00:12:43.786,315] <err> main: Got cte_recv_cb from none queued tag!
[2022-09-23 10:04:40.925] [00:12:43.786,346] <wrn> main: Per syncing already ::43.838,806] <err> main: Got cte_recv_cb from none queued tag!
[2022-09-23 10:04:40.928] [00:12:43.838,836] <wrn> main: Per syncing already enabled!
[2022-09-23 10:04:40.936] [00:12:43.898,803] <err> main: Got cte_recv_cb from none queued tag!
[2022-09-23 10:04:40.950] [00:12:43.898,834] <wrn> main: Per syncing already enabled!
[2022-09-23 10:04:40.950] [00:12:43.951,293] <err> main: Got cte_recv_cb from none queued tag!
[2022-09-23 10:04:40.959] [00:12:43.951,324] <wrn> main: Per syncing already enabled!
[2022-09-23 10:04:41.288] [00:12:44.011,291] <err> main: Got cte_recv_cb from none queued tag!
[2022-09-23 10:04:41.296] [00:12:44.011,322] <wrn> main: Per syncing already enabled!
[2022-09-23 10:04:41.304] [00:12:44.071,289] <err> main: Got cte_recv_cb from none queued tag!
[2022-09-23 10:04:41.311] [00:12:44.071,319] <wrn> main: Per syncing already enabled!
[2022-09-23 10:04:41.344] [00:12:44.116,271] <err> main: Got cte_recv_cb from none queued tag!
[2022-09-23 10:04:41.344] [00:12:44.116,302] <wrn> main: Per syncing already enabled!
[2022-09-23 10:04:41.344] [00:12:44.191,284] <err> main: Got cte_recv_cb from none queued tag!
[2022-09-23 10:04:41.344] [00:12:44.191,314] <wrain: Per syncing already enabled!
[2022-09-23 10:04:41.348] [00:12:44.243,774] <err> main: Got cte_recv_cb from none queued tag!
[2022-09-23 10:04:41.357] [00:12:44.243,804] <wrn> main: Per syncing already enabled!
[2022-09-23 10:04:41.364] [00:12:44.303,771] <err> main: Got cte_recv_cb from none queued tag!
[2022-09-23 10:04:41.372] [00:12:44.303,802] <wrn> main: Per syncing already enabled!
[2022-09-23 10:04:41.378] [00:12:44.356,262] <err> main: Got cte_recv_cb from none queued tag!
[2022-09-23 10:04:41.443] [00:12:44.356,292] <wrn> main: Per syncing already enabled!
[2022-09-23 10:04:41.443] [00:12:44.401,275] <err> main: Got cte_recv_cb from none queued tag!
[2022-09-23 10:04:41.443] [00:12:44.401,275] <wrn> main: Per syncing already enabled!
[2022-09-23 10:04:41.532] [00:12:44.461,273] <err> main: Got cte_recv_cb from none queued tag!
[2022-09-23 10:04:41.533] [00:12:44.461,303] <wrn> main: Per syncing already enabled!
[2022-09-23 10:04:41.533] ASSERTION FAIL [sync_devices[device_index].is_synced && sync_devices[device_index].is_used] @ WEST_TOPDIR/zephyr/samples/bluetooth/rm tag which is not synced!
[2022-09-23 10:04:41.536] [00:12:44.521,331] <err> os: r0/a1:  0x00000004  r1/a2:  0x0000017d  r2/a3:  0x00000003
[2022-09-23 10:04:41.545] [00:12:44.521,331] <err> os: r3/a4:  0x0000017d r12/ip:  0x00000006 r14/lr:  0x0000198f
[2022-09-23 10:04:41.601] [00:12:44.521,362] <err> os:  xpsr:  0x41000000
[2022-09-23 10:04:41.602] [00:12:44.521,392] <err> os: Faulting instruction address (r15/pc): 0x0001c2c0
[2022-09-23 10:04:41.603] [00:12:44.521,423] <err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
[2022-09-23 10:04:41.604] [00:12:44.521,453] <err> o

@jakkra Please use #50454 and #50569, and let me know your observations.

jakkra commented 2 years ago

Hi, the first log looks like the issue, no tags gets synced and nothing happens.

The second log I have seen a few times also, but it could be a bug in the sample that it does not handle the case that happens when it gets an error from periodic adv list as you can see in the log. So for now you can ignore that case in your last log with the assert in application. I did not investigate why I get the error from per adv list. It could be a bug in the controller that some sync is in a state where it's not deleted hence there keeps being cte reported from that device.

cvinayak commented 2 years ago
2022-09-23 10:04:39.519] [00:12:42.397,003] <wrn> main: RESTARTING SCAN AND SYNC...
[2022-09-23 10:04:39.544] Scan disable...Success.
[2022-09-23 10:04:39.544] [00:12:42.399,383] <wrn> bt_hci_core: opcode 0x2048 status 0x42
[2022-09-23 10:04:39.545] [00:12:42.399,414] <err> bt_scan: Failed to remove device from periodic advertiser list
[2022-09-23 10:04:39.545] [00:12:42.399,444] <err> main: Expected tag to be in controller adv_list. Is there a race?
[2022-09-23 10:04:39.554] [00:12:42.473,876] <err> main: Got cte_recv_cb from none queued tag!

This could be a race in application between the below two lines in check_restart_scanning_syncing():

ret = bt_le_per_adv_list_clear();
cancel_all_synced();

May be cancell/delete all sync before clearing the list so that there is no race between clearing and sync_cb being called.

jakkra commented 2 years ago
2022-09-23 10:04:39.519] [00:12:42.397,003] <wrn> main: RESTARTING SCAN AND SYNC...
[2022-09-23 10:04:39.544] Scan disable...Success.
[2022-09-23 10:04:39.544] [00:12:42.399,383] <wrn> bt_hci_core: opcode 0x2048 status 0x42
[2022-09-23 10:04:39.545] [00:12:42.399,414] <err> bt_scan: Failed to remove device from periodic advertiser list
[2022-09-23 10:04:39.545] [00:12:42.399,444] <err> main: Expected tag to be in controller adv_list. Is there a race?
[2022-09-23 10:04:39.554] [00:12:42.473,876] <err> main: Got cte_recv_cb from none queued tag!

This could be a race in application between the below two lines in check_restart_scanning_syncing():

ret = bt_le_per_adv_list_clear();
cancel_all_synced();

May be cancell/delete all sync before clearing the list so that there is no race between clearing and sync_cb being called.

Yeah I think this is just in the sample, I did not have this issue in the "real" application where more precautions are in place.

@jakkra Please use https://github.com/zephyrproject-rtos/zephyr/pull/50454 and https://github.com/zephyrproject-rtos/zephyr/pull/50569, and let me know your observations.

Doesn't seem like a difference, 4 tags 50ms per.adv.int.

TAGS:4,10,8,5,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 0 Synced: 4 Num err: 0/27
[00:00:25.086,975] <wrn> main: RESTARTING SCAN AND SYNC...
Scan disable...Success.
Start scanning...success
[00:00:25.290,252] <wrn> main: RESTARTED!
(25498) PER_ADV_SYNC[4]: [DEVICE]: 2E:E3:9A:94:7F:5C (random) synced, Interval 0x0028 (50 ms), PHY LE 2M
[00:00:25.499,237] <inf> main: Creating Sync for tags in list
Enable receiving of CTE...
(25499) success. CTE receive enabled.
(25682) PER_ADV_SYNC[0]: [DEVICE]: 10:D3:53:04:E7:43 (random) synced, Interval 0x0028 (50 ms), PHY LE 2M
[00:00:25.682,464] <inf> main: Creating Sync for tags in list
Enable receiving of CTE...
(25683) success. CTE receive enabled.
(25774) PER_ADV_SYNC[1]: [DEVICE]: 24:24:1B:74:34:FC (random) synced, Interval 0x0028 (50 ms), PHY LE 2M
[00:00:25.774,536] <inf> main: Creating Sync for tags in list
Enable receiving of CTE...
(25775) success. CTE receive enabled.
(25995) PER_ADV_SYNC[2]: [DEVICE]: 2F:E9:A5:F0:79:E3 (random) synced, Interval 0x0028 (50 ms), PHY LE 2M
[00:00:25.996,276] <inf> main: Creating Sync for tags in list
Enable receiving of CTE...
(25996) success. CTE receive enabled.
Running...
TAGS:14,12,10,5,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 0 Synced: 4 Num err: 0/41
[00:00:26.291,473] <wrn> main: RESTARTING SCAN AND SYNC...
Scan disable...Success.
Start scanning...success
[00:00:26.495,147] <wrn> main: RESTARTED!
Scan is running...
(26845) PER_ADV_SYNC[3]: [DEVICE]: 2F:E9:A5:F0:79:E3 (random) synced, Interval 0x0028 (50 ms), PHY LE 2M
[00:00:26.846,282] <inf> main: Creating Sync for tags in list
Enable receiving of CTE...
(26846) success. CTE receive enabled.
(26982) PER_ADV_SYNC[0]: [DEVICE]: 10:D3:53:04:E7:43 (random) synced, Interval 0x0028 (50 ms), PHY LE 2M
[00:00:26.982,513] <inf> main: Creating Sync for tags in list
Enable receiving of CTE...
(26983) success. CTE receive enabled.
(27398) PER_ADV_SYNC[1]: [DEVICE]: 2E:E3:9A:94:7F:5C (random) synced, Interval 0x0028 (50 ms), PHY LE 2M
[00:00:27.399,322] <inf> main: Creating Sync for tags in list
Enable receiving of CTE...
(27399) success. CTE receive enabled.
Running...
TAGS:10,10,1,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 1 Synced: 3 Num err: 0/23
[00:00:27.496,032] <wrn> main: RESTARTING SCAN AND SYNC...
Scan disable...Success.
Start scanning...success
[00:00:27.700,103] <wrn> main: RESTARTED!
Running...
TAGS:sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 3 Synced: 0 Num err: 0/0
Running...
TAGS:sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 3 Synced: 0 Num err: 0/0
Running...
TAGS:sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 3 Synced: 0 Num err: 0/0
Running...

Continues....
cvinayak commented 2 years ago

@jakkra This issue was closed as the fix for assertion was closed.

Scan disable...Success.
Start scanning...success
[00:00:27.700,103] <wrn> main: RESTARTED!
Running...
TAGS:sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 3 Synced: 0 Num err: 0/0
Running...

I am trying to understand this, Queued: 3 does this not indicate that 3 periodic advertisers where discovered fresh after the Start scanning...success ?. This would indicate that the scanner is working, right?

I reset one of the periodic advertisers, and the Queued: value now increased, hence there is no issue with the scanning in the controller.

image

Please check if by anyway the sample would not be trying to create new sync?

cvinayak commented 2 years ago

Also, when I turned off my advertisers, I see the following happen:

image

cvinayak commented 2 years ago

@jakkra if you still think the scanning is not always started, you may re-open the issue.

jakkra commented 2 years ago

Correct, 3 was discovered, and added to the periodic_adv_list successfully (queued), but 4th was never found, and controller did not manage to sync to any of the 3 ones added to the per_adv_list. Also note in the log that after

Queued: 3 Synced: 0 Num err: 0/0
Running...

Note there is no print of Scan is running..., which means scanner is not reporting anything. Also I reset my tags and started new ones, but they where not found and Queued did not increase.

There is two scenarios that happens:

Can you please check if you use:

CONFIG_BT_TICKER_LOW_LAT=n
CONFIG_BT_TICKER_PRIORITY_SET=n
CONFIG_BT_CTLR_SCAN_UNRESERVED=n

For your last log, isn't it a problem that the controller does not sync to any of the tags added to the queue?

cvinayak commented 2 years ago

For your last log, isn't it a problem that the controller does not sync to any of the tags added to the queue?

I believe it is the sample application that has not registered the synchronizations, the below messages give me the impressions: image

The info->reason: 31 is printed from term_cb, the sample prior to this has Synced: 0 but immediately there after has Synced: 2 at the logged timestamp. I believe that the term_cb call has initiated the restart_per_adv_list_syncing()

Can you please check if you use:

CONFIG_BT_TICKER_LOW_LAT=n
CONFIG_BT_TICKER_PRIORITY_SET=n
CONFIG_BT_CTLR_SCAN_UNRESERVED=n

I am using the sample from the zip file provided in this issue. These 3 are not set in my zephyr/.config file.

cvinayak commented 2 years ago

I am continuing to test, I do get into the scenario where no new periodic advertising is printed by the sample but I was able to terminate the sync by powering off 7 syncs (Synced: 0 was logged). There sure seems something out-of-sync in the sample too. I have now added a printk in the scan_recv to print any periodic advertising received but if is_syncing may be true. I do get the Scan is running... while only Queued: 0 Synced: 0 is being printed.

cvinayak commented 2 years ago

Reopening as only the assertion has been fix. The scanning seems to call scan_recv but there are no periodic advertisings reported.

See log: 20220923_10.txt

jakkra commented 2 years ago

@cvinayak good that you where able to reproduce that case. As you say I agree and think there is some issue in the sample and I'll see if I can fix the sample.

About is_syncing that should be false always when USE_PER_ADV_LIST is defined as that variable is only used when syncing to one tag at a time.

cvinayak commented 2 years ago

@jakkra Could you please try this additional fix #50617

jakkra commented 2 years ago

@cvinayak So far I have not gotten it stuck using the sample. However I still have same problem using the real application. I restart scanning and then it finds 0 or some tag and then scanning stops.

Resetting the tags, adding new tags etc. does nothing. I'm not getting any normal scan or ext. scan responses. If I in this state try again with bt_le_scan_start I get -120 back (EALREADY). Doing bt_le_scan_stop and then bt_le_scan_start it recovers.

I'm trying to figure out what's different to be able to reproduce with a sample.

00> [00:09:44.521,636] <wrn> u_df_scan: Scan running...
00> 16  15  10  2  1  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  
00> Queue size: 5, numSynced: 5 numInAdvList: 0
00> 
00> ------------------------------------------------------------------------------------
00> [00:09:44.597,717] <inf> u_df_scan: Stop scanning...
00> [00:09:44.599,517] <inf> u_df_scan: All tags terminated
00> [00:09:45.106,933] <wrn> u_df_scan: Start scanning...
00> [00:09:45.302,429] <wrn> u_df_scan: Scan running...
00> [00:09:45.325,500] <dbg> u_df_scan: scan_recv: 791: Post EVENT_PROCESS_TAG_LIST
00> [00:09:45.325,866] <inf> u_df_scan: placeTagInPerAdvList: 1B:3B:5C:87:5D:2A (random)
00> [00:09:45.326,019] <dbg> u_df_scan: tryFillPeriodicAdvList: Added 1 queued tags to Periodic Adv List => total 1
00> [00:09:45.326,049] <inf> u_df_scan: Creating Sync for tags in list bt_le_per_adv_sync_create
00> -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  
00> Queue size: 1, numSynced: 0 numInAdvList: 1
00> 
00> ------------------------------------------------------------------------------------
00> -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  
00> Queue size: 1, numSynced: 0 numInAdvList: 1
00> 
00> ------------------------------------------------------------------------------------
00> -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  
00> Queue size: 1, numSynced: 0 numInAdvList: 1
... forever

Another log with no tags at all found:

[00:04:14.586,273] <dbg> u_df_scan: sync_cb: 904: Post EVENT_PROCESS_TAG_LIST
00> [00:04:14.586,730] <inf> u_df_scan: Creating Sync for tags in list bt_le_per_adv_sync_create
00> [00:04:14.620,391] <wrn> u_df_scan: Scan running...
00> [00:04:14.920,654] <wrn> u_df_scan: Scan running...
00> [00:04:15.229,644] <wrn> u_df_scan: Scan running...
00> 16  14  15  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  
00> Queue size: 5, numSynced: 3 numInAdvList: 2
00> 
00> ------------------------------------------------------------------------------------
00> [00:04:15.333,709] <inf> u_df_scan: Stop scanning...
00> [00:04:15.336,120] <inf> u_df_scan: All tags terminated
00> [00:04:15.844,604] <wrn> u_df_scan: Start scanning...
00> -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  
00> Queue size: 0, numSynced: 0 numInAdvList: 0
00> 
00> ------------------------------------------------------------------------------------
00> -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  
00> Queue size: 0, numSynced: 0 numInAdvList: 0
00> 
00> ------------------------------------------------------------------------------------
00> -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  
00> Queue size: 0, numSynced: 0 numInAdvList: 0
00> 
00> ------------------------------------------------------------------------------------
00> -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  
00> Queue size: 0, numSynced: 0 numInAdvList: 0
... forever
cvinayak commented 2 years ago

So far I have not gotten it stuck using the sample.

Please confirm that now you are using #50617 in the sample, and without it you would reproduce it with the sample, right?

Resetting the tags, adding new tags etc. does nothing. I'm not getting any normal scan or ext. scan responses. If I in this state try again with bt_le_scan_start I get -120 back (EALREADY). Doing bt_le_scan_stop and then bt_le_scan_start it recovers.

This could be a Host stack issue, if it was Controller command failing, then there should be a BT_ERR log with HCI error code.

Probable summary for now (correct me):

If you think in anyway I can assist you, do feel free to ask.

jakkra commented 2 years ago

Please confirm that now you are using https://github.com/zephyrproject-rtos/zephyr/pull/50617 in the sample, and without it you would reproduce it with the sample, right?

I'm going to run some tests over night and see. Will come back.

However: I still end up quite often even with that patch in the state where all tags are in the periodic sync list, but no sync in created, resetting any of the tags results in a:

(197999) PER_ADV_SYNC[10]: [DEVICE]: 13:7E:26:0A:26:3B (random) sync terminated reason: 31
[00:03:17.999,938] <wrn> main: (197999)___Device not synced cancel sync tmo
[00:03:17.999,969] <wrn> main:  info->reason: 31

I'm trying to understand if this is expected or not (I think not). It could be that the sample ends up in a scenario where it misses enabling syncing with bt_le_per_adv_sync_create with .options = BT_LE_PER_ADV_SYNC_OPT_USE_PER_ADV_LIST hence the tags are in the periodic adv. list, but no sync is created. However when a tag is reset, the controller notises this and the sync_term with reason: 31 is sent (there was no sync_cb for this tag or anyone else before), wouldn't this indicate that the controller actually is on some way synced or in the syncing state to those tags, as otherwise it shouldn't be able to detect that I reset any of the tags.

[00:04:14.372,833] <wrn> main: Max synced tags reached!
Enable receiving of CTE...
(254373) success. CTE receive enabled.
Running...
TAGS:sync...,3,sync...,sync...,sync...,12,sync...,1,sync...,1,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 7 Synced: 4 Num err: 0/17
[00:04:14.461,975] <wrn> main: RESTARTING SCAN AND SYNC...
Scan disable...Success.
Start scanning...success
[00:04:14.664,703] <wrn> main: RESTARTED!
Running...
TAGS:sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 8 Synced: 0 Num err: 0/0
Scan is running...
Running...
TAGS:sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 11 Synced: 0 Num err: 0/0
Running...
TAGS:sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 12 Synced: 0 Num err: 0/0
Scan is running...
Running...
TAGS:sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 13 Synced: 0 Num err: 0/0
Scan is running...
Running...
... continues until tags are reset:
(xxx) PER_ADV_SYNC[10]: [DEVICE]: MAC (random) sync terminated reason: 31 <-- Sent for each of the tags we are trying to sync to.

And when I power off all tags I get the sync_term reason 31 for all of them.

This could be a Host stack issue, if it was Controller command failing, then there should be a BT_ERR log with HCI error code.

Note this is not happening in the sample, only the "real" application: I have removed this check which is what is returning -EALREADY in host to see what the controller responds with, suprisingly it did not return any error and everything started working again, see log:

00:01:28.196,502] <inf> u_df_scan: Stop scanning... (bt_le_scan_stop, bt_le_per_adv_list_clear, bt_le_per_adv_sync_delete)
[00:01:28.199,310] <inf> u_df_scan: All tags terminated
[00:01:28.314,300] <wrn> u_df_scan: Start scanning... ( calling bt_le_scan_start)
[00:01:28.314,971] <wrn> u_df_scan: Creating Sync for tags in list
[00:01:28.360,900] <wrn> u_df_scan: Scan running...
[00:01:28.534,820] <wrn> u_df_scan: Scan running...
[00:01:28.540,527] <inf> u_df_scan: placeTagInPerAdvList: 18:1A:0E:6E:1B:91 (random)
[00:01:28.540,740] <dbg> u_df_scan: tryFillPeriodicAdvList: Added 1 queued tags to Periodic Adv List => total 1
[00:01:28.570,800] <inf> u_df_scan: placeTagInPerAdvList: 3D:A4:C3:7E:AB:7A (random)
[00:01:28.571,014] <dbg> u_df_scan: tryFillPeriodicAdvList: Added 1 queued tags to Periodic Adv List => total 2
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  
Queue size: 2, numSynced: 0 numInAdvList: 2

------------------------------------------------------------------------------------
31704 free bytes, 8124 allocated bytes
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  
Queue size: 2, numSynced: 0 numInAdvList: 2

------------------------------------------------------------------------------------
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  
Queue size: 2, numSynced: 0 numInAdvList: 2

------------------------------------------------------------------------------------
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  
Queue size: 2, numSynced: 0 numInAdvList: 2

------------------------------------------------------------------------------------
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  
Queue size: 2, numSynced: 0 numInAdvList: 2

------------------------------------------------------------------------------------
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  
Queue size: 2, numSynced: 0 numInAdvList: 2

------------------------------------------------------------------------------------
31704 free bytes, 8124 allocated bytes
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  
Queue size: 2, numSynced: 0 numInAdvList: 2

------------------------------------------------------------------------------------
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  
Queue size: 2, numSynced: 0 numInAdvList: 2

------------------------------------------------------------------------------------
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  
Queue size: 2, numSynced: 0 numInAdvList: 2

------------------------------------------------------------------------------------
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  
Queue size: 2, numSynced: 0 numInAdvList: 2
...
...
...
Continues, now we are not getting any scan rsp, legacy or ext. 2 devices in per. adv. list.
...
...
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  
Queue size: 2, numSynced: 0 numInAdvList: 2

------------------------------------------------------------------------------------
[00:02:02.134,521] <wrn> u_df_scan: Start scanning... <------ manual call to bt_le_scan_start resolves it. This should not be needed.
-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  
Queue size: 2, numSynced: 0 numInAdvList: 2

------------------------------------------------------------------------------------
[00:02:02.208,190] <inf> u_df_scan: placeTagInPerAdvList: 22:91:47:27:C7:B8 (random)
[00:02:02.208,374] <dbg> u_df_scan: tryFillPeriodicAdvList: Added 1 queued tags to Periodic Adv List => total 1
[00:02:02.262,817] <inf> u_df_scan: placeTagInPerAdvList: 3D:EB:68:69:95:96 (random)
[00:02:02.263,000] <dbg> u_df_scan: tryFillPeriodicAdvList: Added 1 queued tags to Periodic Adv List => total 2
[00:02:02.283,538] <inf> u_df_scan: placeTagInPerAdvList: 0E:67:CE:05:03:FD (random)
[00:02:02.283,752] <dbg> u_df_scan: tryFillPeriodicAdvList: Added 1 queued tags to Periodic Adv List => total 3
[00:02:02.301,086] <wrn> u_df_scan: Scan running...
[00:02:02.351,074] <inf> u_df_scan: placeTagInPerAdvList: 18:1A:0E:6E:1B:91 (random)
[00:02:02.351,257] <dbg> u_df_scan: tryFillPeriodicAdvList: Added 1 queued tags to Periodic Adv List => total 4
[00:02:02.389,892] <inf> u_df_scan: placeTagInPerAdvList: 32:04:C1:CB:7A:EB (random)
[00:02:02.390,075] <dbg> u_df_scan: tryFillPeriodicAdvList: Added 1 queued tags to Periodic Adv List => total 5
[00:02:02.501,342] <inf> u_df_scan: placeTagInPerAdvList: 0B:95:9C:AA:8D:B4 (random)
[00:02:02.501,556] <dbg> u_df_scan: tryFillPeriodicAdvList: Added 1 queued tags to Periodic Adv List => total 6
[00:02:02.514,404] <inf> u_df_scan: placeTagInPerAdvList: 2D:7E:60:0F:30:AA (random)
[00:02:02.514,587] <dbg> u_df_scan: tryFillPeriodicAdvList: Added 1 queued tags to Periodic Adv List => total 7
[00:02:02.554,351] <wrn> u_df_scan: Scan running...
[00:02:02.812,988] <inf> u_df_scan: Got sync from: 18:1A:0E:6E:1B:91 (random) Interval 0x0028 (50 ms), PHY 2M SID: 0, Index: 1 => total: 
... All working again

This last log indicates a problem in the controller doesn't it? bt_le_scan_start is called but for some reason it is stopped by the controller and not started again, even tho it's a continious scan.

cvinayak commented 2 years ago

when a tag is reset, the controller notises this and the sync_term with reason: 31 is sent (there was no sync_cb for this tag or anyone else before), wouldn't this indicate that the controller actually is on some way synced or in the syncing state to those tags, as otherwise it shouldn't be able to detect that I reset any of the tags.

If the controller did not generate the LE Periodic Advertising Sync Established event then the LE Periodic Advertising Sync Lost event from the controller should give an error message log from the host: https://github.com/zephyrproject-rtos/zephyr/blob/605d8c85d88d0341df649e029801bfcd2ff54350/subsys/bluetooth/host/scan.c#L988-L992

There would not be any sync_term callback from host.

bt_le_scan_start is called but for some reason it is stopped by the controller and not started again, even tho it's a continious scan.

To rule out any Host initiated scan stop, I would enable CONFIG_BT_DEBUG_LOG=y and CONFIG_BT_DEBUG_HCI_CORE=y (these are too verbose, hope it does not mask any race conditions).

I get the feeling we should start a new github issue with a new sample provided by you, so that I use that as a fresh start to try to reproduce the scan stall issue exactly like you see.

I'm going to run some tests over night and see. Will come back.

Will wait for you inputs before proceeding with the review/merge of #50617. As I see a difference in the original sample with this PR where in extended scan reports resume after periodic adv devices are reset, I would like this PR be confirmed by you and merged for the Zephyr v3.2.0 release. Awaiting your response on this PR.

jakkra commented 2 years ago

If the controller did not generate the LE Periodic Advertising Sync Established event then the LE Periodic Advertising Sync Lost event from the controller should give an error message log from the host:

No error is printed when scanning is stopped.

To rule out any Host initiated scan stop, I would enable CONFIG_BT_DEBUG_LOG=y and CONFIG_BT_DEBUG_HCI_CORE=y (these are too verbose, hope it does not mask any race conditions).

I'm getting a lot of dropped messages even with 1mbit baudrate. I think many events are legacy scan rsp. as there are many BT devices in the area. Any idea what more I can do

--- 5 messages dropped ---
[00:02:39.430,969] <dbg> bt_hci_core: bt_recv: buf 0x20015370 len 10
[00:02:39.430,999] <dbg> bt_hci_core: hci_event: event 0x3e
[00:02:39.431,030] <dbg> bt_hci_core: hci_le_meta_event: subevent 0x0f
[00:02:39.431,518] <dbg> bt_hci_core: bt_recv: buf 0x20015370 len 10
[00:02:39.431,518] <dbg> bt_hci_core: hci_event: event 0x3e
[00:02:39.431,549] <dbg> bt_hci_core: hci_le_meta_event: subevent 0x0f
[00:02:39.432,037] <dbg> bt_hci_core: bt_recv: buf 0x20015370 len 10
[00:02:39.432,067] <dbg> bt_hci_core: hci_event: event 0x3e
[00:02:39.432,098] <dbg> bt_hci_core: hci_le_meta_event: subevent 0x0f
--- 2 messages dropped ---
[00:02:39.433,197] <dbg> bt_hci_core: hci_le_meta_event: subevent 0x0f
--- 3 messages dropped ---
[00:02:39.442,932] <dbg> bt_hci_core: bt_recv: buf 0x20015370 len 10
--- 8 messages dropped ---
[00:02:39.458,251] <dbg> bt_hci_core: bt_recv: buf 0x20015370 len 10
--- 6 messages dropped ---
[00:02:39.458,282] <dbg> bt_hci_core: hci_event: event 0x3e
--- 8 messages dropped ---
[00:02:39.463,500] <dbg> bt_hci_core: hci_event: event 0x3e
--- 3 messages dropped ---
[00:02:39.463,531] <dbg> bt_hci_core: hci_le_meta_event: subevent 0x0f
[00:02:39.464,019] <dbg> bt_hci_core: bt_recv: buf 0x20015370 len 10
[00:02:39.464,050] <dbg> bt_hci_core: hci_event: event 0x3e
[00:02:39.464,050] <dbg> bt_hci_core: hci_le_meta_event: subevent 0x0f
[00:02:39.464,538] <dbg> bt_hci_core: bt_recv: buf 0x20015370 len 10
[00:02:39.464,569] <dbg> bt_hci_core: hci_event: event 0x3e

Will wait for you inputs before proceeding with the review/merge of https://github.com/zephyrproject-rtos/zephyr/pull/50617. As I see a difference in the original sample with this PR where in extended scan reports resume after periodic adv devices are reset, I would like this PR be confirmed by you and merged for the Zephyr v3.2.0 release. Awaiting your response on this PR.

I can confirm that https://github.com/zephyrproject-rtos/zephyr/pull/50617 does NOT solve any of the issues, I got the sample in the state multiple times where resetting the tags does nothing.

I get the feeling we should start a new github issue with a new sample provided by you, so that I use that as a fresh start to try to reproduce the scan stall issue exactly like you see.

I can do that, however the problems are still the ones I reported originally in this issue (excdept the fixed crash). Let me summarize:

  1. Scanning is started, but after a short while it's stopped. Host thinks scanning is enabled, but commenting out that check in host and sending a scan_enable to controller, then scanning is started again and everything works.
  2. All tags added to periodic adv. list, but controller not syncing to any of them (you have reproduced this), if tag is reset, then sync terminated reason: 31 is sent from controller.

Both of the above I can reproduce with the provided sample.

Extra: For 1. above, I tried with below as suggested here https://docs.zephyrproject.org/3.0.0/reference/kconfig/CONFIG_BT_DEBUG_MONITOR_UART.html#std-kconfig-CONFIG_BT_DEBUG_MONITOR_UART:

CONFIG_BT_DEBUG_MONITOR_UART=y
CONFIG_UART_CONSOLE=n

I could not decode it using btmon, not sure why, but maybe this log can show you something. putty_scan_stopped.log

cvinayak commented 2 years ago

@jakkra I do not see any reference to bt_le_per_adv_sync_create_cancel() in your sample, I am suspecting a race in the sample that would make the host to disable scanning with a new sync is created exactly after the sample has called scan_enable(). Please use ~bt_le_per_adv_sync_create_cancel()~ bt_le_per_adv_sync_delete(syncForList) before scan_disable() . You have to use the syncForList variable to cancel the syncing in progress.

Do let me know if you manage to see any change.

jakkra commented 2 years ago

@cvinayak good catch, missed that in the sample but unfortunately no difference. In the "real" application I had the call to bt_le_per_adv_sync_delete however had it after and not before bt_le_scan_stop, moved it but also same there.

TAGS:11,2,1,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 0 Synced: 3 Num err: 0/14
[00:00:53.555,755] <wrn> main: RESTARTING SCAN AND SYNC...
Scan disable...Success.
Start scanning...success
[00:00:53.758,514] <inf> main: Creating Sync for tags in list
[00:00:53.758,728] <wrn> main: Per syncing enabled!
[00:00:53.758,728] <wrn> main: RESTARTED!
[00:00:53.982,788] <dbg> main: scan_recv: Added tag from per adv list
[00:00:54.009,338] <dbg> main: scan_recv: Added tag from per adv list
Running...
TAGS:sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 2 Synced: 0 Num err: 0/0
Running...
TAGS:sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 2 Synced: 0 Num err: 0/0
Running...
TAGS:sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 2 Synced: 0 Num err: 0/0

Capture-per-scan-stop

cvinayak commented 2 years ago

@jakkra You still have [00:00:53.758,728] <wrn> main: Per syncing enabled!

I have your updated sample, plus some commits I am using to debug here: https://github.com/zephyrproject-rtos/zephyr/pull/50696, please use this and let me know.

I have the sample running close to an hour now (I only have 7 advertisers though).

jakkra commented 2 years ago

You still have [00:00:53.758,728] main: Per syncing enabled!

That is printed as part of the restart_per_adv_list_syncing after scan_enable, so that's expected.

@cvinayak I'll try with yours.

jakkra commented 2 years ago

@cvinayak ran sample and code in https://github.com/zephyrproject-rtos/zephyr/pull/50696 with 7 tags and I stopped getting scan rsp. after 1 min.

So the only difference now is our tags (and the environment), let me compile the exact same tag SW I use for you. Are you using nRF52833 DK?

jakkra commented 2 years ago

@cvinayak Use this sw on your tags, I made two versions one with one chain and one if you need to stress more which sends 16 per adv chain packets. But the one with only one chain is enough to reproduce. build_tag_nrf52833dk.zip

Using the sample you have I can even reproduce with only one tag easily with #define NUM_SYNC_TO_RESTART_SCAN 1.

Also in your sample I saw this assert many times times, for example:

ASSERTION FAIL [0] @ WEST_TOPDIR/zephyr/subsys/bluetooth/controller/ll_sw/nordic/lll/lll.c:475
        lll_preempt_calc: Actual EVENT_OVERHEAD_START_US = 762
and
ASSERTION FAIL [0] @ WEST_TOPDIR/zephyr/subsys/bluetooth/controller/ll_sw/nordic/lll/lll.c:475
        lll_preempt_calc: Actual EVENT_OVERHEAD_START_US = 1251
cvinayak commented 2 years ago

Use may remove the other commits that adds the assertions code, they indicate that the scheduling is O(n) and not all radio events are getting scheduled due to overlapping periodic syncs.

(i will take your tags once I do more investigation using the current advertisers I have).

I was able to reproduce, log attached. 20220927_1.log

I have to look at the bt_le_scan_update() used in host if there is any race that could keep the scanning disabled under race condtions.

jakkra commented 2 years ago

I was able to reproduce, log attached.

Great, hopefully with my tag SW you will be able to reproduce faster.

Often it takes not long for me to reproduce, let me know if you need me to test anything. Here I added a scan start every 10s and it recovers. scan-try-enable-periodically.log

cvinayak commented 2 years ago

@jakkra I have updated the sample in #50696 (commit hash 6ff6a4c39a2d6b1ea6d1c69b37217b69fa223e5b), removed the commit that add assertion checks and removed the possible race in calls to bt_le_scan_update() in Host.

Please try the sample now and let me know if it helps.

jakkra commented 2 years ago

@cvinayak No difference unfortunately.

cvinayak commented 2 years ago

@cvinayak No difference unfortunately.

Are you using my sample in #50696 unmodified?

jakkra commented 2 years ago

@cvinayak yes Capture-build-log1

Correct Zephyr version used. Toolchain from v2.1.0.

-- west build: generating a build system
Loading Zephyr default modules (Zephyr base).
-- Application: /home/jkra/ncs/zephyr-main/zephyr/samples/bluetooth/aoa_receiver_multiple
-- Found Python3: /home/jkra/ncs/toolchains/v2.1.0-rc2/usr/local/bin/python3.8 (found suitable exact version "3.8.2") found components: Interpreter 
-- Cache files will be written to: /home/jkra/.cache/zephyr
-- Zephyr version: 3.2.0-rc3 (/home/jkra/ncs/zephyr-main/zephyr) <-------------
-- Found west (found suitable version "0.14.0", minimum required is "0.7.1")
-- Board: ubx_evkninab4_nrf52833
....
cvinayak commented 2 years ago

@jakkra ok, then it must be a controller bug... I just have to wait longer to reproduce it. If I am able identify the cause and have a fix tonight, I will let you know the PR.

cvinayak commented 2 years ago

@jakkra I have been doing some investigation on race conditions, my sample with debug aids added is in commit 8403f5013a3dbdf56a1dc39854cccbddb9d3be72 of #50696. It take over an hour of running for me to discover the scan issue, could you please try the sample at your end and provide me with the console log (preferrably from boot until the stall plus another 10 mins of logging to check if Scan is running is printed few times).

jakkra commented 2 years ago

@cvinayak On it!

jakkra commented 2 years ago

@cvinayak Here is a log where it happend 3 times in a row, the first two ones I did not wait 10 min after scan stopped, but after the last reset I waited about 5 min. I also changed to print every 100 scan_recv instead of 500 if that may give more info with more prints. gh-50432-22-09-28-09.txt

cvinayak commented 2 years ago

@jakkra I have been running for over an hour now.... may be my advertisers the default samples/bluetooth/direction_finding_connectionless_tx are not exercising the problem on the receiver.... I will switch to using your advertiser on my 7 boards. Will keep you posted.

20220928_2.log

cvinayak commented 2 years ago

@jakkra still hard for me to reproduce using your 1 chain advertiser hex.... my receiver hex is zephyr-aoa-receiver-multiple-202209281019.zip

Log of non-reproduction when 7 advertisers using your hex: 20220928_3.log

I guess I just have to wait until reproduction at my end :-(

Could you use my hex and see if there is any difference in how we build, maybe.

jakkra commented 2 years ago

@cvinayak I can reproduce no problem using your .hex. You could try the 16 chain .hex instead as that may help to reproduce faster. I think the difference for us is the environment, I experimented a bit and removing the antenna on my receiver (will receive less scan_recv from legacy advertisers around) then it seems to take longer until it gets to the problem.

cvinayak commented 2 years ago

jakkra the 16 chain hex too does not easy reproduce the issue on the sample I have in the PR #50696. The log ~ 30 mins, and the receiver is still running.

Now this is getting hard for me to debug anymore. 20220928_5.log

jakkra commented 2 years ago

@cvinayak I see, I'll grab a radio shielded box and play around with the transmitters and receiver and see if I can get a way to reproduce faster.

jakkra commented 2 years ago

@cvinayak Good news, I can now reproduce inside a shielded box with 7 tags. I made some changes. In the attached .zip you have the patch in aoa_receiver_multiple, .hex for your nrf52833-dk tags and a log.

What I changed:

gh-50432-22-09-28-15-08_no_sync.zip

With this setup I'm able to get scanning to stop often within 1 minute.

cvinayak commented 2 years ago
  • Tags advertise an extra legacy adv. at 7.5ms.
  • Tags advertise ext. adv. at 7.5ms.

Hmm, minimum legacy and ext. advertising intervals by BT Spec is 20 ms, is the 7.5 ms a typo?