Closed erwango closed 3 years ago
@andyross FYI. It's hard for me to make a direct link with your change but it seems that the schedule for this test is a bit tight on this board (cf different status when run with west vs twister) and #35276 may had some small timing changes that impacts the final result.
@andyross Can you please take a look here
Only under twister?! How would it even know? Surely there's something different about the generated binary?
As far as what's going on, this shouldn't have affected much with that test. The only spot in the code where we were hitting the oddball "configure zero preemptible priorities" situation was in one of the philosophers sub-samples. I'll take a look and diff the resulting .config files, maybe that has a clue. I do at least have one of these boards. Do any others fail too, or is it just this one?
Also, maybe it's just a phantom bisect result? Did you try reverting this one change in isolation (which should work) to see if that was the issue?
Test is failed only when run with twister but is passed when run manually. This is the only board impacted, as far as I know
@nashif commented that the difference here maybe due to ASSERTs being enabled or not.
CONFIG_ASSERT=y always in tests. I know twister will enable some non-default warnings, it's not supposed to affect generated code AFAIK.
I'll have a new round of test in case this was a friday evening bug. But I confirm that I tried just before and after the commit a certain number of time to validate the bisect result. I'll also compare twister and west binaries, while I'm at it.
One mystery solved:
kernel.scheduler.multiq
requires use of -DCONF_FILE=prj_multiq.conf
when run with west.
Using this, I get the same status in twister and manual.
Then I confirm that test is failing on commit 851d14a and failed on the SHA1 before. I've been able to confirm this a number for times.
BUT, I'm not able to reproduce the issue anymore on top of master, and I can see same status on our bench. So I'm closing this point (anyone interested in digging into this a bit more is free to reopen).
issues appears again on the SHA1 a02704b SHA1 95e359d
It should be noticed that forcing the LL_FLASH_SetLatency(LL_FLASH_LATENCY_2);
in the drivers/clock_control/clock_stm32_ll_common.c
(end of the stm32_clock_control_init function)
is a workaround to pass the test.
Hi All,
I have done some debugs, and here are my findings.
As per the analysis, I can confirm there was some synchronization issue. We observed
int64_t tdelta = k_uptime_delta(&elapsed_slice);
was going above 200 for say 201. and at next thread
int64_t tdelta = k_uptime_delta(&elapsed_slice);
was getting less than 200 for say 199. And after that, all goes to assert. And error occurs.
Currently, this issue was not observed with the latest main.
This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.
Analysis: Assertion failed at WEST_TOPDIR/zephyr/tests/kernel/sched/schedule_api/src/test_slice_scheduling.c:61: thread_tslice: ((tdelta >= expected_slice_min) && (tdelta <= expected_slice_max) && (idx == thread_idx)) is false
Time slice is programmed at 200ms
#define SLICE_SIZE 200
Expected time slice is computed
int64_t expected_slice_min = k_ticks_to_ms_floor64(k_ms_to_ticks_ceil32(SLICE_SIZE));
int64_t expected_slice_max = k_ticks_to_ms_floor64(k_ms_to_ticks_ceil32(SLICE_SIZE) + 1);
expected_slice_min = 200 expected_slice_max = 200 (CONFIG_SYS_CLOCK_TICKS_PER_SEC=10000)
but measurements show sometimes tdelta =199
so test is failed because 199 (measured time slice) is lower than 200 (expected_slice_min)
Similarly measurements show sometimes tdelta =201
Note: compiling with debug option or adding some printf() may lead to test passed, so it seems to me like a timing issue. which could explain the instability observed from one commit to another.
I think this -1 compare to theoretical time slice could be explain because of : test imperfection (measuring time slice thanks to k_uptime_delta() from different threads) or conversion from ms to tick and reverse, or kernel TICK clamping to lower value (https://github.com/zephyrproject-rtos/zephyr/blob/0e1ecb373a1b5b639ea09ebc1cba7a1c44ea5101/drivers/timer/cortex_m_systick.c#L187)
Thus my proposal would be to enlarge time slice criterion PR #38967. With proposed changes, (with disco_l475_iot1) expected_slice_min = 199 expected_slice_max = 201
Describe the bug Test tests/kernel/sched/schedule_api/kernel.scheduler.multiq is failed on disco_l475_iot1 since commit 851d14afc8941313a6f3faeb74f84ed73a33429a
Test is failed only when run with twister but is passed when run manually. This is the only board impacted, as far as I know
To Reproduce Steps to reproduce the behavior:
twister --test tests/kernel/sched/schedule_api/kernel.scheduler.multiq -p disco_l475_iot1 --device-testing --device-serial /dev/ttyACM0
Expected behavior Test should be passed
Logs and console output
Environment (please complete the following information):
Additional context Add any other context about the problem here.