zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.88k stars 6.63k forks source link

remaining timeslice not getting reset when context switching among threads with same priority #3548

Closed zephyrbot closed 7 years ago

zephyrbot commented 7 years ago

Reported by Sharron LIU:

GH-2455 implement thread-wise timeslice. With this feature the remaining timeslice supposed get reset when context switching among threads. The test result shows timeslice was not reset in certain specific case.

Please review the test case https://github.com/zephyrproject-rtos/zephyr/pull/80 The test run with 4 preemptive thread in same priority, with timeslice enabled.

a ztest native thread (taken a half timeslice, then relinquish CPU)

thread_1 (expect taking a full timeslice, however the test result shows it took a half timeslice, which seems remaining from previous slice)

thread_2 (expect taking a full timeslice)

thread_3

how-to-reproduce

pull the test case https://github.com/zephyrproject-rtos/zephyr/pull/80

source zephyr-env.sh

cd tests/kernel/threads_scheduling/schedule_api

echo CONFIG_DEBUG=y>>prj.conf

make pristine

make qemu

verify from uart output

expected output: SLICE SIZE 200 thread[0] elapsed slice 100, expected 100 thread[1] elapsed slice 200, expected 200 thread[2] elapsed slice 200, expected 200

actual output SLICE SIZE 200 thread[0] elapsed slice 100, expected 100 thread[1] elapsed slice 100, expected 200 thread[2] elapsed slice 200, expected 200

(Imported from Jira ZEP-2107)

zephyrbot commented 7 years ago

by Ramesh Thomas:

I don't see a bug here. Time slicing only guarantees the thread does not exceed the time slice duration.

The test result shows timeslice was not reset in certain specific case.

The output data you posted does not indicate this is the case. You are assuming that since one thread was interrupted before the full time slice, it actually started from a non-zero time slice. In fact it would have simply got interrupted before it could use all its time slice. The next thread then correctly starts with zero and goes for the full expected time slice duration.

zephyrbot commented 7 years ago

by Sharron LIU:

Ramesh Thomas ,

thread[0], the native ztest thread was waiting a semaphore at 100, and expected taking 100 timeslice >> OK thread[1], the next thread get scheduled in, suppose to take a full timeslice 200, however it actually take 100 >> fail.

zephyrbot commented 7 years ago

by Ramesh Thomas:

Sharron LIU What was thread[2] doing while thread[1] was running? thread[2] will be scheduled If it woke up due to a scheduled event e.g. its sleep time expired or the semaphore it was waiting on got released.

Bottom line is that any thread getting interrupted and not using its full time slice is not a bug but expected behavior. If it exceeds the time slice, then it is a bug. Your test should test whether threads run past their allotted time slice.

zephyrbot commented 7 years ago

by Sharron LIU:

test case here: https://github.com/zephyrproject-rtos/zephyr/pull/80/commits/222c8692635563e158f2474ebf59b1c48a62084a

zephyrbot commented 7 years ago

by Andrew Boie:

Disregard, I see the slice size is set at runtime.

By the way why did you disable the scheduling_api test for Nios II? It's true that we don't have tickless idle for Nios II, but that should only cause the tickless tests to be turned off, not the tests that show that the scheduler works....I ran the test and it passes on Nios II, I'm gonna turn it back on.

zephyrbot commented 7 years ago

by Andrew Boie:

Ramesh Thomas

Thinking this is a bug but I'm still looking.

This is how the test works:

Main thread:

System boots with time slice set to 0

Main thread creates three child threads, thread[0..2] of same priority as the main thread. They do not immediately run.

Main sets time slice size to 200ms

Main calls k_uptime_delta(&elapsed_slice).

Main busy-waits for 100ms

Main then calls k_sem_take on semaphores owned by each of the threads; it will sit there and block until thread[0..2] have all done their thing

Each child thread:

Call k_uptime_delta(&elapsed_slice). It measures the interval since the last time this was called. Which will be either when the main thread did it, or another child thread. This is how the intervals are calculated.

Busy-wait for 220ms, enough time to get scheduled out.

Release semaphore

What we are seeing:

thread[0] elapsed slice 100, expected 100
thread[1] elapsed slice 100, expected 200
thread[2] elapsed slice 200, expected 200

thread[0] should run 100ms after main thread, since main thread busy-waited for 100ms before it gave up the CPU with k_sem_take() calls thread[1] should run 200ms after thread[0]. But for some reason, thread[1] ran 100ms after thread[0]. thread[0] did not get a full time-slice like it ought to.

You are assuming that since one thread was interrupted before the full time slice, it actually started from a non-zero time slice.

Don't follow your logic here. thread[0] had never been scheduled before. Seems to me it ought to be allowed to run for 200ms before being preempted. However I need to talk to Benjamin Walsh on what the intended policy is. Studying the scheduling code now.

zephyrbot commented 7 years ago

by Andrew Boie:

Sharron LIU Benjamin Walsh Ramesh Thomas

I think I see what is going on. This test runs with CONFIG_TICKLESS_IDLE=n, by the way.

The kernel tracks time slice usage with the _time_slice_elapsed global. Every time the timer interrupt goes off and the timer driver calls _nano_sys_clock_tick_announce() with the elapsed time, this is added to _time_slice_elapsed. If it exceeds the total time slice, the thread is moved to the back of the queue for that priority level and _time_slice_elapsed is reset to zero.

This is the only time _time_slice_elapsed is reset. If a thread uses up a partial time slice, and then cooperatively switches to another thread, the next thread will inherit the remaining time slice.

I notice there is some code _update_time_slice_before_swap(), and that it is called before _Swap() runs, but this is ONLY built if CONFIG_TICKLESS_KERNEL=y.

I worked up a small patch which implements _update_time_slice_before_swap() if CONFIG_TIMESLICING=y and CONFIG_TICKLESS_KERNEL=n. It just resets _time_slice_elapsed to 0. The measured values match the expected values in this case.

Now, if I do enable the tickless kernel, I get different behavior, thread 1's timeslice is incorrect, but only on the second round:

tc_start() - test_slice_reset
thread[0] elapsed slice 100, expected 100
thread[1] elapsed slice 200, expected 200
thread[2] elapsed slice 200, expected 200
thread[0] elapsed slice 100, expected 100
thread[1] elapsed slice 80, expected 200
thread[2] elapsed slice 200, expected 200

Not sure yet what's going on here, in the tickless case the thread elapsed time should be getting reset on any context switch, I'm going to root-cause this before I send patches out. Update: this short elapsed count only seems to reproduce on x86. My suspicions are with the HPET driver.

zephyrbot commented 7 years ago

by Andrew Boie:

Sharron LIU Ramesh Thomas I think I have found a gap in test coverage, and the implementation.

So far, we are testing 2 scenarios:

Thread cooperatively yields after consuming a partial slice, next thread should get a full slice (failing, next thread is getting a short time slice)

Thread uses up its time slice, next thread is scheduled (working as intended)

But there is a third case. Let's say there are two groups of timeslicing threads, T[0...N] and S[0...N]. The S group of threads runs at a high priority but are all currently blocked. Let's now say that thread T[0] is running and consumes half a time-slice when it gets an interrupt, where the ISR gives a semaphore or something which will enable thread S[0] to run. It is expected that S[0] will get a full time-slice.

Unfortunately, this is only implemented on ARM and x86, there is a hook in the interrupt code before __swap() runs to call _update_time_slice_before_swap(). This has not been properly done on other arches, so in this scenario S[0] will get a short timeslice. I'm going to file a different bug on this issue.

zephyrbot commented 7 years ago

by Ramesh Thomas:

This test runs with CONFIG_TICKLESS_IDLE=n

The feature of resetting time_slice interval for a thread at swap was implemented only in tickless kernel (an unintentional byproduct). It was mentioned in the original jira story GH-2455?focusedCommentId=17899&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-1789

This has not been properly done on other arches

Tickless kernel support was added only to x86, arm and arc while I was working on Zephyr. Also tickless kernel builds on top of tickless idle and I believe only those archs had timers with tickless idle support. Looks like a call to _update_time_slice_before_swap() is missing in the arc isr wrapper. Please add it in _rirq_exit and _firq_exit like it is being done in intstub.S in x86.

1.Thread cooperatively yields after consuming a partial slice, next thread should get a full slice

Time slicing gets updated in _update_time_slice_before_swap() which currently gets called before a swap. If there are places where thread switch happens without a call to swap, then it needs to be called from those places.

k_yield() calls _Swap so _update_time_slice_before_swap() would get called. Not sure why that would fail.

thread[1] elapsed slice 80, expected 200

Not exactly sure. Based on your description of the test case logic, looks like thread[0] ran for 80 ms before thread[1] got scheduled. Where exactly does the second run of the test start from? Can you try turning ofs time slicing at the end of each run to make sure there is no time slicing active at the beginning of each iteration?

zephyrbot commented 7 years ago

by Andrew Boie:

Ramesh Thomas ,

{quote}

thread[1] elapsed slice 80, expected 200

Not exactly sure. Based on your description of the test case logic, looks like thread[0] ran for 80 ms before thread[1] got scheduled. Where exactly does the second run of the test start from? Can you try turning ofs time slicing at the end of each run to make sure there is no time slicing active at the beginning of each iteration? {quote}

I opened GH-3654 with more details. This is reproducible on multiple arches. On ARM it's completely messed up with tickles idle turned on, see GH-3653. For discussion purposes I think it would be helpful if you briefly looked at tests/kernel/threads_scheduling/schedule_api/src/test_sched_timeslice_reset.c These threads collect a timestamp as the very first thing they do.

zephyrbot commented 7 years ago

by Sharron LIU:

Verify the original failure fixed. Now the "thread[1]" take full timeslice. Closing this.

appreciate Andrew Boie spending time digging into this. Previously thought of using kernel event logger to record the timestamp of context switching, later dropped, which looks too heavy.

zephyrbot commented 7 years ago

Related to GH-3653

zephyrbot commented 7 years ago

Related to GH-3654