zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.69k stars 6.53k forks source link

Bluetooth: Controller: Syncing with devices with per. adv. int. < ~10ms eventually causes events from BT controller stop arriving #49915

Closed jakkra closed 2 years ago

jakkra commented 2 years ago

Describe the bug When using periodic adv. intervals <~10ms then sometimes BT Controller stops reporting any events, including sync termination, CTEs, scan events etc.

To Reproduce Have 10 tags sending per. adv. with 7.5ms interval with adv. train of 5 and CTE enabled. I have not found replicatable way to reproduce, sometimes it's enough to just let all 10 tags sync and then wait a minute and it happens. Sometimes it's easier to reproduce by randomly resetting some tags. I have reproduced without CTE sampling enabled, however may or may not be easier to reproduce with it on. I have also reproduced with only 5-6 tags but I think it's easier with more.

Application: aoa_receiver_multiple_49915.zip

Expected behavior BT controller keeps reporting events.

Impact BT Controller stops generating events, application will be in strange state.

Logs and console output The application prints main: Interval: 6 whenever a device sending per. adv. is seen, as can be seen it's stopped reporting at the end.

*** Booting Zephyr OS build zephyr-v3.1.0-4244-g1995d349db37  ***
Starting Connectionless Locator Demo
Bluetooth initialization...success
Scan callbacks register...success.
Periodic Advertising callbacks register...success.
Start scanning...success
[00:00:00.004,638] <inf> main: Creating Sync for tags in list
Waiting for periodic advertising...
Scan is running...
[00:00:00.328,277] <wrn> main: Interval: 6
[00:00:00.328,521] <dbg> main: scan_recv: Added tag from per adv list
[00:00:00.470,764] <wrn> main: Interval: 6
[00:00:00.470,977] <dbg> main: scan_recv: Added tag from per adv list
[00:00:00.498,992] <wrn> main: Interval: 6
[00:00:00.499,206] <dbg> main: scan_recv: Added tag from per adv list
Scan is running...
[00:00:00.709,167] <wrn> main: Interval: 6
(709) PER_ADV_SYNC[0]: [DEVICE]: 36:DC:42:90:4D:0E (random) synced, Interval 0x0006 (7 ms), PHY LE 2M
[00:00:00.710,174] <dbg> main: sync_cb: Removed tag from per adv list
[00:00:00.710,174] <inf> main: Creating Sync for tags in list
Enable receiving of CTE...
(710) success. CTE receive enabled.
[00:00:00.741,027] <wrn> main: Interval: 6
[00:00:00.741,241] <dbg> main: scan_recv: Added tag from per adv list
[00:00:00.762,207] <wrn> main: Interval: 6
[00:00:00.762,420] <dbg> main: scan_recv: Added tag from per adv list
[00:00:00.883,422] <wrn> main: Interval: 6
[00:00:00.883,666] <dbg> main: scan_recv: Added tag from per adv list
[00:00:00.972,381] <wrn> main: Interval: 6
(988) PER_ADV_SYNC[1]: [DEVICE]: 3B:66:BD:7F:64:67 (random) synced, Interval 0x0006 (7 ms), PHY LE 2M
[00:00:00.988,983] <dbg> main: sync_cb: Removed tag from per adv list
[00:00:00.989,013] <inf> main: Creating Sync for tags in list
Enable receiving of CTE...
(989) success. CTE receive enabled.
Running...
TAGS:sync...,sync...,33,sync...,2,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 4 Synced: 2 Num err: 0/35
[00:00:01.096,862] <wrn> main: Interval: 6
[00:00:01.097,106] <dbg> main: scan_recv: Added tag from per adv list
Scan is running...
[00:00:01.310,974] <wrn> main: Interval: 6
(1312) PER_ADV_SYNC[2]: [DEVICE]: 08:F7:34:24:2B:28 (random) synced, Interval 0x0006 (7 ms), PHY LE 2M
[00:00:01.312,622] <dbg> main: sync_cb: Removed tag from per adv list
[00:00:01.312,622] <inf> main: Creating Sync for tags in list
Enable receiving of CTE...
(1313) success. CTE receive enabled.
[00:00:01.730,895] <wrn> main: Interval: 6
Scan is running...
[00:00:01.944,915] <wrn> main: Interval: 6
[00:00:01.945,129] <dbg> main: scan_recv: Added tag from per adv list
Running...
TAGS:sync...,55,56,sync...,66,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 5 Synced: 3 Num err: 0/178
[00:00:02.179,016] <wrn> main: Interval: 6
[00:00:02.442,382] <wrn> main: Interval: 6
Scan is running...
[00:00:02.598,999] <wrn> main: Interval: 6
[00:00:02.862,396] <wrn> main: Interval: 6
Running...
TAGS:sync...,79,49,sync...,59,sync...,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 5 Synced: 3 Num err: 0/191
[00:00:03.018,981] <wrn> main: Interval: 6
[00:00:03.035,156] <wrn> main: Interval: 6
[00:00:03.035,369] <dbg> main: scan_recv: Added tag from per adv list
[00:00:03.197,052] <wrn> main: Interval: 6
(3197) PER_ADV_SYNC[3]: [DEVICE]: 28:B5:CF:0A:86:88 (random) synced, Interval 0x0006 (7 ms), PHY LE 2M
[00:00:03.198,150] <dbg> main: sync_cb: Removed tag from per adv list
[00:00:03.198,181] <inf> main: Creating Sync for tags in list
Enable receiving of CTE...
(3198) success. CTE receive enabled.
[00:00:03.268,554] <wrn> main: Interval: 6
(3276) PER_ADV_SYNC[4]: [DEVICE]: 1D:29:0E:6A:E8:65 (random) synced, Interval 0x0006 (7 ms), PHY LE 2M
[00:00:03.277,221] <dbg> main: sync_cb: Removed tag from per adv list
[00:00:03.277,221] <inf> main: Creating Sync for tags in list
Enable receiving of CTE...
(3277) success. CTE receive enabled.
Scan is running...
[00:00:03.613,586] <wrn> main: Interval: 6
(3621) PER_ADV_SYNC[5]: [DEVICE]: 17:50:92:34:97:F7 (random) synced, Interval 0x0006 (7 ms), PHY LE 2M
[00:00:03.622,253] <dbg> main: sync_cb: Removed tag from per adv list
[00:00:03.622,253] <inf> main: Creating Sync for tags in list
Enable receiving of CTE...
(3622) success. CTE receive enabled.
[00:00:03.912,384] <wrn> main: Interval: 6
Running...
TAGS:19,82,44,sync...,53,13,144,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 3 Synced: 6 Num err: 0/358
[00:00:04.050,933] <wrn> main: Interval: 40
[00:00:04.069,000] <wrn> main: Interval: 6
[00:00:04.453,491] <wrn> main: Interval: 6
[00:00:04.671,020] <wrn> main: Interval: 6
[00:00:04.948,516] <wrn> main: Interval: 6
Running...
TAGS:24,64,21,sync...,35,30,204,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 3 Synced: 6 Num err: 0/384
Scan is running...
[00:00:05.368,530] <wrn> main: Interval: 6
Running...
TAGS:24,56,30,sync...,32,35,196,sync...,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 3 Synced: 6 Num err: 0/381
[00:00:06.150,970] <wrn> main: Interval: 40
[00:00:06.185,180] <wrn> main: Interval: 6
(6207) PER_ADV_SYNC[6]: [DEVICE]: 2B:D6:79:55:8E:32 (random) synced, Interval 0x0006 (7 ms), PHY LE 2M
[00:00:06.207,916] <dbg> main: sync_cb: Removed tag from per adv list
[00:00:06.207,916] <inf> main: Creating Sync for tags in list
Enable receiving of CTE...
(6208) success. CTE receive enabled.
[00:00:06.418,548] <wrn> main: Interval: 6
[00:00:06.770,996] <wrn> main: Interval: 6
Scan is running...
Running...
TAGS:27,37,23,sync...,27,35,191,sync...,16,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 2 Synced: 7 Num err: 0/367
[00:00:07.191,009] <wrn> main: Interval: 6
[00:00:07.200,988] <wrn> main: Interval: 40
[00:00:07.654,937] <wrn> main: Interval: 6
[00:00:07.888,549] <wrn> main: Interval: 6
Running...
TAGS:27,43,36,sync...,37,44,170,sync...,20,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 2 Synced: 7 Num err: 0/380
[00:00:08.034,912] <wrn> main: Interval: 6
(8088) PER_ADV_SYNC[7]: [DEVICE]: 17:BB:4D:10:4D:39 (random) synced, Interval 0x0006 (7 ms), PHY LE 2M
[00:00:08.089,447] <dbg> main: sync_cb: Removed tag from per adv list
[00:00:08.089,447] <inf> main: Creating Sync for tags in list
Enable receiving of CTE...
(8090) success. CTE receive enabled.
Scan is running...
[00:00:08.867,034] <wrn> main: Interval: 6
Running...
TAGS:40,55,21,sync...,25,32,17,14,18,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 1 Synced: 8 Num err: 0/223
[00:00:09.287,017] <wrn> main: Interval: 6
[00:00:09.560,943] <wrn> main: Interval: 6
(9570) PER_ADV_SYNC[8]: [DEVICE]: 3A:9E:4A:1C:6F:D1 (random) synced, Interval 0x0006 (7 ms), PHY LE 2M
[00:00:09.570,434] <dbg> main: sync_cb: Removed tag from per adv list
[00:00:09.570,465] <inf> main: Creating Sync for tags in list
Enable receiving of CTE...
(9570) success. CTE receive enabled.
Scan is running...
[00:00:09.710,906] <wrn> main: Interval: 6
[00:00:09.783,935] <err> bt_scan: Prepare CTE conn IQ report failed -22
Running...
TAGS:33,53,30,15,31,38,18,21,21,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 0 Synced: 9 Num err: 0/263
[00:00:10.134,857] <wrn> main: Interval: 6
Running...
TAGS:37,50,27,44,31,34,21,23,21,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 0 Synced: 9 Num err: 0/289
[00:00:11.030,792] <wrn> main: Interval: 6
[00:00:11.038,604] <wrn> main: Interval: 6
Scan is running...
[00:00:12.008,056] <err> bt_scan: Prepare CTE conn IQ report failed -22
Running...
TAGS:39,49,22,43,34,40,11,18,17,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 0 Synced: 9 Num err: 0/281
[00:00:12.049,133] <wrn> main: Interval: 6
[00:00:12.436,981] <wrn> main: Interval: 6
[00:00:12.445,037] <wrn> main: Interval: 6
Running...
TAGS:39,55,24,39,24,28,16,16,19,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 0 Synced: 9 Num err: 0/262
[00:00:13.740,722] <err> bt_scan: Prepare CTE conn IQ report failed -22
Running...
TAGS:30,44,27,39,32,34,19,19,20,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 0 Synced: 9 Num err: 0/277
[00:00:14.116,912] <wrn> main: Interval: 6
Scan is running...
Running...
TAGS:23,49,26,24,32,31,14,18,17,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 0 Synced: 9 Num err: 0/244
[00:00:15.251,892] <wrn> main: Interval: 6
[00:00:15.586,822] <wrn> main: Interval: 6
[00:00:16.003,265] <wrn> main: Interval: 6
Running...
TAGS:28,46,23,29,32,33,18,19,20,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 0 Synced: 9 Num err: 0/260
[00:00:16.054,779] <wrn> main: Interval: 6
[00:00:16.424,407] <wrn> main: Interval: 6
[00:00:16.424,652] <dbg> main: scan_recv: Added tag from per adv list
[00:00:16.435,058] <wrn> main: Interval: 6
Scan is running...
Running...
TAGS:23,52,20,39,28,34,12,13,14,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 1 Synced: 9 Num err: 0/235
[00:00:17.270,812] <wrn> main: Interval: 6
[00:00:17.900,756] <wrn> main: Interval: 6
Running...
TAGS:25,64,23,41,31,36,14,20,21,sync...,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 1 Synced: 9 Num err: 0/280
[00:00:18.734,436] <wrn> main: Interval: 6
(18773) PER_ADV_SYNC[9]: [DEVICE]: 2B:DA:D3:2E:AE:89 (random) synced, Interval 0x0006 (7 ms), PHY LE 2M
[00:00:18.774,322] <dbg> main: sync_cb: Removed tag from per adv list
[00:00:18.774,353] <inf> main: Creating Sync for tags in list
Enable receiving of CTE...
(18775) success. CTE receive enabled.
[00:00:18.961,120] <wrn> main: Interval: 40
Running...
TAGS:23,45,27,31,26,31,14,14,17,2,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 0 Synced: 10 Num err: 0/237
Running...
----->After here no events arrive from controller, even powering of tags etc. won't generate sync term<-----
TAGS:0,0,0,0,0,0,0,0,0,0,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 0 Synced: 10 Num err: 0/0
Running...
TAGS:0,0,0,0,0,0,0,0,0,0,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 0 Synced: 10 Num err: 0/0
Running...
TAGS:0,0,0,0,0,0,0,0,0,0,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 0 Synced: 10 Num err: 0/0
Running...
TAGS:0,0,0,0,0,0,0,0,0,0,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 0 Synced: 10 Num err: 0/0
Running...
TAGS:0,0,0,0,0,0,0,0,0,0,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 0 Synced: 10 Num err: 0/0
Running...
TAGS:0,0,0,0,0,0,0,0,0,0,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 0 Synced: 10 Num err: 0/0

Environment (please complete the following information):

cvinayak commented 2 years ago

@jakkra Please check if #50022 fixes the event stall problem?

jakkra commented 2 years ago

@cvinayak unfortunately still same issue.

cvinayak commented 2 years ago

@cvinayak unfortunately still same issue.

Ok, my fault that I used simulation and an own sample (https://github.com/zephyrproject-rtos/zephyr/pull/42721) to check whether my changes were covered by trying to sync to 5 other periodic advertisers at 7.5ms interval. I will use your sample instead and try reproducing the issue. Sorry for taking your time on the verification.

cvinayak commented 2 years ago

@jakkra I was able to reproduce the stall using my own sample today when I had 7 advertisers synchronized. What I observed is, the system recovers after 8 minutes which corresponds to the 24-bit RTC rollover.

I was not able to build your attached sample, it overflowed RAM.

Could you please confirm that in your application, after the stall, the system resume after ~8 mins?

jakkra commented 2 years ago

@cvinayak I'm not in office today but will test on monday. Regarding overflow strange, but just decrease to like 10 CONFIG_BT_PER_SYNC_MAX =10 and CONFIG_BT_CTRL_DF_PER_SCAN_CTE_NUM=1

jakkra commented 2 years ago

@cvinayak yes after about 8 minutes I get some events, see log:

Queued: 2 Synced: 8 Num err: 0/0
Running...
TAGS:0,0,0,0,0,sync...,0,sync...,0,0,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 2 Synced: 8 Num err: 0/0
Running...
TAGS:0,0,0,0,0,sync...,0,sync...,0,0,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 2 Synced: 8 Num err: 0/0
Running...
TAGS:0,0,0,0,0,sync...,0,sync...,0,0,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,
Queued: 2 Synced: 8 Num err: 0/0
----------------> 8 minutes passed around here <-------------------
--- 4 messages dropped ---
[00:08:17.301,879] <dbg> main: term_cb: Device was synced:
--- 3 messages dropped ---
(497305) PER_ADV_SYNC[5]: [DEVICE]: 0F:D9:D7:94:29:59 (random) sync terminated reason: 31
[00:08:17.306,030] <dbg> main: term_cb: Device was synced:
[00:08:17.306,243] <dbg> main: term_cb: Removed tag from per adv list
(497309) PER_ADV_SYNC[0]: [DEVICE]: 27:FD:B7:9F:15:AC (random) sync terminated reason: 31
[00:08:17.309,356] <dbg> main: term_cb: Device was synced:
[00:08:17.309,570] <dbg> main: term_cb: Removed tag from per adv list
(497311) PER_ADV_SYNC[2]: [DEVICE]: 3C:D3:E5:89:52:D1 (random) sync terminated reason: 31
[00:08:17.311,981] <dbg> main: term_cb: Device was synced:
[00:08:17.312,225] <dbg> main: term_cb: Removed tag from per adv list
(497319) PER_ADV_SYNC[7]: [DEVICE]: 18:DC:19:27:F2:37 (random) sync terminated reason: 31
[00:08:17.319,488] <dbg> main: term_cb: Device was synced:
[00:08:17.319,854] <dbg> main: term_cb: Removed tag from per adv list
(497326) PER_ADV_SYNC[8]: [DEVICE]: 11:B4:6D:C3:AA:D8 (random) sync terminated reason: 62
[00:08:17.326,995] <wrn> main: (497326)___Device not synced cancel sync tmo

[00:08:17.326,995] <wrn> main:
                               info->reason: 62

[00:08:17.327,026] <inf> main: Creating Sync for tags in list
Scan is running...
Running...
cvinayak commented 2 years ago

@jakkra Thank you. From GPIO pin debugging of the scheduling, the timeouts of each periodic sync and the scanning seem to be running in the ULL context but I do not see any LLL code be invoked. I am continuing to inspect the implementation, so far it has been difficult in not getting to a fix quickly.

cvinayak commented 2 years ago

@jakkra ~I believe to have nailed down the cause of the repeated skip of colliding events in #50231, please verify and revert back so that the fix can make it to the latest release versions.~ Please ignore, the issue still exists even after the change.

cvinayak commented 2 years ago

@jakkra @mtpr-ot

I have concluded that an issue in the implementation of the ticker_resolve_collision() function is causing continuous skipping of all active radio events due to missing information to calculate the age of the radio events since their last successful use of the radio.

https://github.com/zephyrproject-rtos/zephyr/blob/b27c5d73ef09278186dc8858dfbed2a33dff6031/subsys/bluetooth/controller/ticker/ticker.c#L719-L743

@jakkra I will ask you to use CONFIG_BT_TICKER_LOW_LAT=y alongwith a minor fix to enable its use. You will need this PR #50252

jakkra commented 2 years ago

@cvinayak thanks, this seems to resolve the issue, will do a bit more testing but so far so good!

cvinayak commented 2 years ago

CONFIG_BT_TICKER_LOW_LAT=y is the original controller contributions scheduling implementation addressing slow CPUs like nRF51x series. There are limitations of reduced radio utilization due to look ahead based scheduling, but lower CPU use, lower CPU duty cycle and lower CPU latencies on lower priority IRQs.

CONFIG_BT_TICKER_LOW_LAT=n is newer controller scheduling implementation with just-in-time resolution of overlapping radio events, which provided better radio utilization by being able to use free radio time that was un-utilized by the previously done radio events. Drawback of higher CPU use, CPU duty cycle and CPU latencies on lower priority IRQs

In ticker_resolve_collision(), the problem I am trying to understand is (2 tickers of 7.5ms period and 7.5 ms reservation, ticker1 to ticker2 has 5ms to start with):

  1. ticker1 current age = 7.5 ms + (1 lazy x 7.5 ms) = 15 ms, force = 0; ticker2 next age = (7.5 - 5) + (1 lazy x 7.5 ms ) = 10 ms, force = 1, next_force = 1. ticker1 looses.
  2. ticker2 current age = 7.5 + (1 x 7.5) = 15 ms, force = 1; ticker1 next age = (7.5 - 2.5) + (2 x 7.5 ms) = 20 ms, force = 0, next_force = 0, ticker 2 looses.
  3. ticker1 current age = 7.5 ms + (2 x 7.5 ms) = 22.5 ms, force = 0; ticker2 next age = (7.5 - 5) + (2 x 7.5 ms ) = 17.5 ms, force = 1, next_force = 1. ticker1 looses.
  4. ...

am I, right? that ticker1 and ticker2 never get to run...

Reply from @mtpr-ot :

Yes, "force" and "age" are breaking normal round-robin. That's how it's always been. Age priority is a Nordic requirement that I reluctantly put in, and force is prevented as much as possible I think the logic in the ticker is OK (?), but the external usage of force may break it We have a condition in ULL to prevent random use of force flag

The CONFIG_BT_TICKER_LOW_LAT=y implementation recovers easily from most stressful multiple simultaneous roles. Just round-robin scheduling would not handle supervision/sync lost cases, age was required to avoid smaller interval roles from starving larger interval roles, and force was required to break the round-robin. Will see if I can have a solution for the repeated current ticker loosing to next ticker in ticker_resolve_collision().

cvinayak commented 2 years ago

@jakkra Do let me know if you were able to do more testing and if any issues faced while using CONFIG_BT_TICKER_LOW_LAT=y?

Meanwhile, I have this PR for CONFIG_BT_TICKER_LOW_LAT=n #50311, do switch back and test with this PR (i.e. upstream default new implementation should give higher radio utilization, is the future implementation for new features in Controller).

jakkra commented 2 years ago

@cvinayak

  1. CONFIG_BT_TICKER_LOW_LAT=y works fine.
  2. With: #50311
    • CONFIG_BT_TICKER_LOW_LAT=n and CONFIG_BT_TICKER_PRIORITY_SET=y => Does not work same as before ("hangs")
    • CONFIG_BT_TICKER_LOW_LAT=n and CONFIG_BT_TICKER_PRIORITY_SET=n => Works.

There is a difference between 1. and 2. With 1. 25 tags synced I get an average of ~8 CTE's per second (uneven distributed over the tags, some have 5 some 20) Typical angles per second:

11  13  15  11  5  16  11  16  6  5  11  9  13  6  12  12  10  10  11  11  12  15  15  12  6
------------------------------------------------------------------------------------
10  11  16  12  10  14  12  15  5  4  11  12  12  5  14  12  9  10  12  13  10  13  9  13  4
------------------------------------------------------------------------------------
11  10  15  13  13  11  8  13  5  4  11  11  14  4  12  9  9  11  10  10  9  15  9  14  6
------------------------------------------------------------------------------------
10  11  12  11  10  11  8  14  6  7  8  17  9  4  12  13  10  10  11  13  11  11  12  14  4
------------------------------------------------------------------------------------
10  11  12  12  10  20  10  14  5  6  9  9  11  5  15  13  12  11  11  14  12  12  11  15  4

With 2. 25 tags synced I get an average of ~4 CTE's per second (however most tags are at ~4 CTE's per second so even dsitribution of CTE's) Typical angles per second:

------------------------------------------------------------------------------------
5  6  6  5  5  6  6  4  4  5  6  6  5  5  5  4  4  5  5  5  6  5  5  6  5  
------------------------------------------------------------------------------------
4  4  3  5  3  4  4  5  4  5  4  4  4  5  5  5  5  5  5  5  4  5  5  4  4  
------------------------------------------------------------------------------------
5  5  6  4  5  5  6  4  4  4  6  4  6  5  5  5  4  5  5  4  5  5  5  6  6  
------------------------------------------------------------------------------------
5  4  4  5  4  5  4  4  5  5  4  5  5  5  5  3  5  5  4  5  3  5  5  4  5  
------------------------------------------------------------------------------------
4  5  5  3  5  5  5  4  4  4  5  5  4  5  5  5  4  4  1  5  5  4  4  5  4  
------------------------------------------------------------------------------------
4  4  4  4  4  4  4  3  4  4  3  4  4  4  3  4  4  4  4  4  4  4  4  4  4  
------------------------------------------------------------------------------------
5  5  5  5  4  5  5  4  5  5  5  5  4  5  5  4  5  5  5  5  5  5  5  5  5  
cvinayak commented 2 years ago

@jakkra Thank you and appreciate your thoroughness in verifying the fix.

Regarding the rate of CTE, I would think of it to depend on both, how much of the advertisers are overlapping over the 7.5ms interval, and how the observer listen window is scheduled. I am not very familiar with the direction finding CTE feature, and would leave it you to decide which 1 or 2 is best for your application.

A suggestion, but you do not have to spend your immediate time to try it, to use continuous scanning say 10 ms interval 10ms window and have CONFIG_BT_CTLR_SCAN_UNRESERVED=y. This will provide most radio time to periodic sync roles and scanning get to use spare time between the periodic sync event.

Thank you again for contributing towards a stable Controller.

jakkra commented 2 years ago

@jakkra Thank you and appreciate your thoroughness in verifying the fix.

Regarding the rate of CTE, I would think of it to depend on both, how much of the advertisers are overlapping over the 7.5ms interval, and how the observer listen window is scheduled. I am not very familiar with the direction finding CTE feature, and would leave it you to decide which 1 or 2 is best for your application.

Less CTE's is pretty much same thing as missed periodic adv. packets (AUX_SYNC_IND), if the radio finds the per. adv. packet, it will also sample the CTE (assuming there is RAM). For us it's a tradeoff around as many CTE's as possible, but also being able to find new tags and re-sync to tags we lost sync to.

The more AUX_SYNC_IND/CTE's the better for us. So the behaviour with CONFIG_BT_TICKER_LOW_LAT=y (more CTEs) is preferred (so with 1.)

Using CONFIG_BT_TICKER_LOW_LAT=y together with https://github.com/zephyrproject-rtos/zephyr/pull/50311 I get this crash pretty much immediately. ASSERTION FAIL [(ticker_status == 0) || (ticker_status == 2)] @ WEST_TOPDIR/zephyr/subsys/bluetooth/controller/ll_sw/ull_scan_aux.c:699

A suggestion, but you do not have to spend your immediate time to try it, to use continuous scanning say 10 ms interval 10ms window and have CONFIG_BT_CTLR_SCAN_UNRESERVED=y. This will provide most radio time to periodic sync roles and scanning get to use spare time between the periodic sync event.

Thank you again for contributing towards a stable Controller.

CONFIG_BT_CTLR_SCAN_UNRESERVED=y This sounds interesting and I'll try it out. I guess this make it harder to sync to new devices also if already many are synced?

cvinayak commented 2 years ago

Using CONFIG_BT_TICKER_LOW_LAT=y together with #50311 I get this crash pretty much immediately. ASSERTION FAIL [(ticker_status == 0) || (ticker_status == 2)] @ WEST_TOPDIR/zephyr/subsys/bluetooth/controller/ll_sw/ull_scan_aux.c:699

You need this #50252 in addition.