Closed nashif closed 7 years ago
by Youvedeep Singh:
Hi Andrew Boie Can you please help me to test this quickly to see if this is same issue as GH-2054 or not. Currently I do not have access to ARM Board.
What need to be done :- File Name : "kernel/sched.c" In function : - _update_time_slice_before_swap" Just comment out following condition ;- "// if (!remaining || (_time_slice_duration < remaining)) {" (I am writing this in this format as formatting of patch is getting messed up) :)
After commenting out can you please retest this once,
by Youvedeep Singh:
With Power Management Disabled ==> Test Pass With Tickless Idle Enabled ==> Test Pass With Tickless Kernel Enabled ==> Test Fail
PASS - test_sleep_wakeup_preemptible. tc_start() - test_time_slicing_preemptible
Assertion failed at /home/ysingh/IOT/Zephyr/Source_Code/github/test2/zephyr/tests/kernel/threads_scheduling/schedule_api/src/test_sched_timeslice_and_lock.c:15e
PASS - test_time_slicing_disable_preemptible. tc_start() - test_lock_preemptible USAGE FAULT Executing thread ID (thread): 0x20070220 Faulting instruction address: 0x82680 No coprocessor instructions Fatal fault in thread 0x20070220! Aborting. USAGE FAULT Executing thread ID (thread): 0x20070220 Faulting instruction address: 0x82680 No coprocessor instructions Fatal fault in thread 0x20070220! Aborting. USAGE FAULT Executing thread ID (thread): 0x20070220 Faulting instruction address: 0x82680 No coprocessor instructions Fatal fault in thread 0x20070220! Aborting. USAGE FAULT Executing thread ID (thread): 0x20070220 Faulting instruction address: 0x82680 No coprocessor instructions Fatal fault in thread 0x20070220! Aborting. USAGE FAULT Executing thread ID (thread): 0x20070220 Faulting instruction address: 0x82680 No coprocessor instructions Fatal fault in thread 0x20070220! Aborting. USAGE FAULT Executing thread ID (thread): 0x20070220 Faulting instruction address: 0x82680 No coprocessor instructions Fatal fault in thread 0x20070220! Aborting.
I am debugging his issue further
by Youvedeep Singh:
Further i put some debug prints, looks like crash seems to be comming from void " test_time_slicing_preemptible" @ zassert_true(tdata[1].executed == 1, NULL).
I put 2 prints one just before this and another just after this. The print ahead of this is getting printed but print after this is not getting printed when TICKLESS_KERNEL is enabled. Not sure if this is the issue, I will dig this further.
by Youvedeep Singh:
Further Debug on this issue. Here are 2 issues :-
test_time_slicing_preemptible :- If i comment "zassert_true(tdata[1].executed == 1, NULL);", I am not hitting the issue. As of now I am not focusing this issue.
Failure at "test_slice_reset". In this case time slice given to each thread of quual priority is not same,
While debugging identified that time slide is not set properly, even though time slide set should be 200, but inside "_nano_sys_clock_tick_announce" number of ticks passed is 199 due to which condition is not getting satisfied for setting the time (remaining and next_to both are staying 0).
Reason for next_to become 0 ?
Inside "handle_time_slicing"
_time_slice_elapsed = 199
and _time_slice_duration = 200
due to which , _time_slice_elapsed >= _time_slice_duration is not getting satisfied and and next_ts becomes 0.
This looks like issue inside "cortex_m_systick.c", i will continue debug this tomorrow.
by Youvedeep Singh:
Further It looks like "_sys_clock_tick_count" is not getting updated correctly, which is leading to may crazy issues. I dumped the value of systick to look
Loaded Value of ticks 199 (Max ticks), So we were expecting that value of ticks will upltimately on timer expire should be at least 199, but it is coming out to be ~32. I am suspecting that "Loaded Value f" is not getting set when timer is rolling back and again stats counting from Load value. So when we are reaching at interrupt, difference between load value and Current value is not significant and it is leading to low value of clock count.
This is evident from following dump.
**7 Loaded Value ff10e0 Current 7a807
***7 Loaded Value f ff10e0 Current 4ccc2
**7 Loaded Value f ff10e0### _sys_clock_tick_count 32 // TIll now ~195 ticks have expired. _nano_sys_clock_tick_announce*** //Timer got expired here and it generated and interrupt _set_time====> idle_original_ticks 1 time 1 //request to prog for one tick. =======_get_elapsed_clock_time===
***4 Loaded Value ff10e0 Current ff10c0 //It can be seen that timer has expired and Current value is Loaded with previous and it counts downward 1f182
by Andrew Boie:
When we resolve this issue, we need to be sure to add an additional test case to schedule_api's testcase.yaml that does this test with tickless kernel enabled.
by Youvedeep Singh:
Further I was able to root cause why "_sys_clock_tick_count" was not getting updated. The reason was "SysTick_CTRL_COUNTFLAG_Msk" is clear on read, So this was getting cleared when it was read while reading it from sched.c (while updating tmp_sys_clock_tick_count) _set_time====> idle_original_ticks 199 time 200 overflow 0 =======updated 0===_sys_clock_tick_count 0 =======updated 0===_sys_clock_tick_count 199 =======updated 0===_sys_clock_tick_count 397
After this, still I am facing another issues related to clock looks like k_cycle_get32 is getting expired sooner ztest is getting failed. I will continue looking into this.
by Youvedeep Singh:
sure Andrew i will add this test in "testcase.yaml " on getting proper fix for this test.
by Youvedeep Singh:
There were few issues around tickless kernel related to systick Cortex m timer and a scheduler issue which was impacting all the timers.
Timer Systick : Handle Systick timer rollback in tickless kernel Case **Issues related to cortex_m_systick **
PASS - test_sleep_wakeup_preemptible. tc_start() - test_time_slicing_preemptible
PASS - test_unlock_preemptible.
Patch : Scheduler : Account kernel time keeping when thread releases cpu befo *Issue related to cortex_m_systick /local apic/ hpet timers ***** When multiple pre-emptive threads of same priority are queued together, then these threads should execute in round robin fashion with equal priority, but in certain scnerios this scheduling is not fair and one thread gets lower time slide than expected. For example if threads A, B and C are of equal priority and are queued in order A -> B ->C. Let us assume Thread A is running with time slide as 200mSec. So timer is programmed for 200 ticks and thread A starts running. Let us assume thread A has run for 100 ticks and it gets semtake and not at this stage A is put at end of queue and Thread B is in top (B -> C ->A). Now thread B starts running and on timer expire after 100 ticks. At the end of 200 ticks (100 ticks spent by A + 100 ticks by B), timer expired. now elapsed ticks updated = 200. and which results in elapsed_timeslice for B too to 200 (but actually B has taken only 100 ticks).
So this patch sets the time out to remaining time again and accounts the spend time. So that after 100 ticks expired for A it accounts only 100 ticks and B gets fair chance.
Before Case :- PASS - test_sched_is_preempt_thread. tc_start() - test_slice_reset thread[0] elapsed slice 100, expected 100 thread[1] elapsed slice 65, expected 200 thread[2] elapsed slice 199, expected 200 thread[0] elapsed slice 100, expected 100 thread[1] elapsed slice 77, expected 200 thread[2] elapsed slice 199, expected 200
Also in certain corner cases still i observed that we are hitting the issue (at low frequency) but the above patch fix most of issues, I will continue looking into corner cases when we are getting issue and will update.
by Youvedeep Singh:
Attached is the pull request.
by Youvedeep Singh:
This patch got merged.
by Youvedeep Singh:
Patch merged so closing the issue.
Reported by Andrew Boie:
Steps to reproduce:
Add the following lines to prj.conf:
Run the test for qemu_cortex_m3:
(Imported from Jira ZEP-2217)