rtic-rs / rtic

Real-Time Interrupt-driven Concurrency (RTIC) framework for ARM Cortex-M microcontrollers
https://rtic.rs
Apache License 2.0
1.8k stars 208 forks source link

Monotonic / timer queue doesn't seem to support clock wrapping #659

Closed Dridus closed 2 years ago

Dridus commented 2 years ago

We have some projects where we're using a fast-running (48MHz) clock underlying the monotonic timer, and while I haven't fully verified by test it appears that RTIC's scheduling behavior does not deal well with wrapping. It probably doesn't come up that often if one is timing by a slow clock (e.g. 32kHz) but if one is timing by a fast clock the counter wraps around often.

Scheduling seems to always end up as a particular value of the Monotonic::now() clock. spawn_after is implemented in terms of spawn_at with the particular time being calculated as now() + delay. For dwt-systick-monotonic and the atsamd_hal Rtc<Count32Mode> implementations of Monotonic which both use Fugit, that wraps (even though it should be a checked add by contract, another bug?):

The spawn_at implementation doesn't seem to do anything interesting with instant, just passes it along to TimerQueue::enqueue_unchecked, here: https://github.com/rtic-rs/cortex-m-rtic/blob/master/src/tq.rs#L24

which compares the given instant to the earliest scheduled thing in the timer queue but makes no special handling for wraparound, which has already silently occurred anyway. I think it'll just put the new task with a very early instant at the beginning of the TQ and then pend whatever interrupt backs the monotonic.

once that ISR fires, it'll call TimerQueue::dequeue: https://github.com/rtic-rs/cortex-m-rtic/blob/master/macros/src/codegen/timer_queue.rs#L142

which then will (most likely) immediately compare the too-early time with the current time and immediately fire it.

I don't think it's any better in the DWT SYSTICK case as now is implemented in terms of CYCCNT, a 32 bit wrapping timer running at the CPU clock rate. SYSTICK itself is a count-down timer, but all the clock comparisons seem to be in terms of CYCCNT.

I did find this internals book reference which talks about wrapping as an issue and that it's handled by signed comparisons, but I'm not sure that really addressed the issue then, and it certainly doesn't now as I'm pretty sure it was written a while and at least one major rewrite ago: https://github.com/rtic-rs/cortex-m-rtic/blob/master/book/en/src/internals/timer-queue.md#resolution-and-range-of-cyccntinstant-and-cyccntduration

In any case, would love someone more familiar to comment on it :-)

cc @sakian

korken89 commented 2 years ago

Hi, thanks for reporting.

This does not look like a bug to me. RTIC simply requires some operations to be supported on Monotonic::Instant and Monotonic::Duration for it to be valid. In this case you are using fugit as the underlying time library, whose Instant represents a wrap-able time. So any Instant is valid MAX/2 - 1 ticks around it. This means that your example is correct behavior, as the the new instant is within the correct span for a valid instant:

fugit::Instant::<u32, 1, 1000>::from_ticks(0xffff_ffff) + fugit::Duration::<u32, 1, 1000>::from_ticks(10)
Instant { ticks: 9 } // Correct

On the old book chapter you found, this is not valid anymore. It was for RTIC 0.5 and the Monotonic trait replaces the old method.

Is there anything else you'd like to ask or that I should clarify?

Dridus commented 2 years ago

to clarify, it's correct behavior that tasks might be executed either immediately (if now + delay wraps) or at the intended delay ± jitter, not delay ± jitter?

korken89 commented 2 years ago

Sort of, it does not depend on wrapping - just if the instant is in the future or in the past. So if an instant is in the past the task will be executed immediately, else it will be scheduled. With fugit this means that you can represent time that is MAX/2 - 1 ticks into the future or the past compared to a specific Instant. The library does the needed math to make this happen.

perlindgren commented 2 years ago

In principle jitter cannot be negative (besides the jitter of the HW timer). There is also a difference between a task being "released" (ready to run), and the actual execution. This may cause accumulated drift if using "now" as the timebase.

Dridus commented 2 years ago

Sort of, it does not depend on wrapping - just if the instant is in the future or in the past. So if an instant is in the past the task will be executed immediately, else it will be scheduled. With fugit this means that you can represent time that is MAX/2 - 1 ticks into the future or the past compared to a specific Instant. The library does the needed math to make this happen.

But you can't represent a time up to MAX/2-1 ticks into the future relative to any specific Instant though correct? Only relative to Instant 0, otherwise it's ambiguous whether it was intended to be in the past. I think it's surprising that spawn_after / reschedule_after might end up being basically the same as spawn, depending on the current value of the clock and what duration, since the addition with wrapping is hidden in the implementation of spawn_after / reschedule_after.

In principle jitter cannot be negative (besides the jitter of the HW timer). There is also a difference between a task being "released" (ready to run), and the actual execution. This may cause accumulated drift if using "now" as the timebase.

Ah yes good clarifications, I wasn't accurate.


Overall the behavior makes sense once one knows of all the details and limitations. I do think it'd be nice to be less surprising, either by documentation or better yet some indication from the API that wrapping/as-soon-as-possible execution might occur rather than duration + jitter.

As it stands it also makes for potentially complicated calling code if one really does need the proper wait. I can't judge whether it's important often enough to justify updating the implementation to support it automatically though.

perlindgren commented 2 years ago

By extending the Monotonic timer to a u64 store, duration will be large enough for (all) practical use cases, and solves the original issue. Please re-open this issue if further clarifications are needed.