What is the proper timer "now" convention for late interrupts?

zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.

https://docs.zephyrproject.org

Apache License 2.0

10.32k stars 6.32k forks source link

What is the proper timer "now" convention for late interrupts? #12247

Closed andyross closed 5 years ago

andyross commented 5 years ago

Opening this new issue to get the giant (and IMHO mostly irrelevant) bike shed out of #11722 and #12204.

With the advent of pervasive tickless, we have a new API convention to argue about: What happens when a multi-tick timer interrupt is delayed?

Inside delayed timer callbacks, code may (and usually does) set more callbacks for future events. Should the "current" time used as the reference for these callbacks be (1) the "true" current time or (2) the time the tick was expected to be delivered?

Right now the code implements choice (2). The reasoning is that often you have periodic processing where a timeout callback always sets a new timeout N ticks in the future (this is exactly the way k_timer works, for example). Under choice (1), an interrupt delayed by 1 tick will add that tick to the sequence, and all further ticks will slowly fall behind the intended pace. With choice (2), we can recover the lost time and keep the timekeeping steady even in the face of giant latency problems. With choice (1), the default latency of the "next" interrupt can be held more constant[1], making it a better fit for code trying to do "realtime" control using the tick timer[2]. It's also worth pointing out that choice (2) results in a simpler implementation, and that the "saved expired timers list" implementation used in the old code had multiple bugs fixed during the rewrite.

Basically, the question is this: do we want a separate API variant (or maybe kconfig) available to allow code to use the convention it wants, or can we achieve what we want with the existing code?

[1] But IMHO this is of marginal value as we're by definition experiencing catastrophic latency problems already!

[2] IMHO a poor API fit if the code or interrupt environment can cause latencies like this. Such apps should be using separate hardware timers (most SoCs have a few available) or a higher priority tick timer.

pabigot commented 5 years ago

In the context of the current clock architecture where timeouts are tightly coupled to clock management, I think the current implementation is the best alternative. However, I do not like that the timeout is invoked without information about the true passage of time.

If Zephyr were to transition to a tickless system clock driven by a free-running monotonic fixed-rate counter that will not overflow in the life of the device (which I hope would be a design goal for #12068), I would prefer an alarm (timeout) infrastructure based on deadlines as absolute system times. Alarm callbacks should be invoked with access to the true system time and the deadline at which they were scheduled. Repeating alarms that are not cancelled before the callback returns should be re-scheduled, nominally at the interval after the original deadline, but at a different deadline if one is set before the callback returns.

See how systemd does this as an example.

andyross commented 5 years ago

I would prefer an alarm (timeout) infrastructure based on deadlines as absolute system times

So would we all, and that's what the API exposes. The problem is that we can't have that here because an interrupt was delayed. The question is how to recover from the failure.

pabigot commented 5 years ago

I would prefer an alarm (timeout) infrastructure based on deadlines as absolute system times.

So would we all, and that's what the API exposes.

I don't see that. I see both k_timer and _timeout using deadlines as a relative offset from "now", where "now" means "whatever value the kernel last managed to write into the curr_tick variable". If deadlines were given as absolute system times, there would be no reason for k_timer_start and _add_timeout to take a signed duration (ticks) parameter to define the deadline.

The problem is that we can't have that here because an interrupt was delayed.

Yes, because we're relying on a software-maintained tick timer. curr_tick is updated only when the kernel gets around to it; timeout deadlines are evaluated relative to and during curr_tick processing; and timeout callbacks are invoked during curr_tick processing at a point where curr_tick is frozen. Delays are unavoidable.

For systems where (a) the system time is driven by a free-running live monotonic counter that can't be delayed by interrupt processing, (b) deadlines are absolute values that can be compared with the live counter, and (c) the callback can compare the live counter against the time it was supposed to be invoked, I believe the problem largely just goes away (or is delegated to the callback, which is the same thing).

For systems where some intervention is required (e.g. to extend a 24-bit RTC counter by using an overflow counter to provide more bits) there'd be some overhead, but that can still be decoupled from processing timeouts.

"Which of the two times should we expose to the callback?" is the wrong question. A better question is "how can we give the callback both times?" An answer is to make system time track real-world elapsed time more accurately, then provide the users with both the "scheduled at" time and access to the (constantly updating) "true" time. #12068 is an opportunity to pursue that approach.

andyross commented 5 years ago

Yes, because we're relying on a software-maintained tick timer. curr_tick is updated only when the kernel gets around to it; timeout deadlines are evaluated relative to and during curr_tick processing; and timeout callbacks are invoked during curr_tick processing at a point where curr_tick is frozen. Delays are unavoidable.

You kind of lost me here. You have a mechanism in mind on how to invoke a timeout callback in a context other than a timer interrupt handler?

I mean, I get your API criticism, and it's valid (I don't see any reason not to have a global monotonic time_t kind of thing at the timeout API level). I just don't see how it amounts to anything but shuffling of proverbial deck chairs. The problem still exists: you have a callback that was supposed to have been run earlier but is actually being run late, and someone needs to make the decision as to which time to use for "six ticks later".

Here's some code to focus the discussion:

for(int seconds = 0; seconds < 3600; seconds++) {
    k_sleep(1000);
    printk("tick\n");
}

Under the current scheme (and assuming no pressure from other threads in the system that would preempt it), this code works correctly and ticks for one hour even if some driver or hypervisor or qemu environment (yes, qemu has exactly this behavior) acts to block interrupts or halt the system for some amount of time greater than one tick.

If the callback that was being used by k_sleep() to wake up the calling thread instead used "current real time" as the base, then it would be late.

And an API change like you propose can't fix that, because this code isn't using that API, it's using a more primitive one and you'd have to know whether or not the calling code was trying to implement a "1 second or greater delay" (in which case late/skewed delivery is fine) or the one hour counter it actually implements.

I'm fine with having an alternative API available for timeout callbacks that understand the difference to use to discriminate things (which is what this whole issue is about). But it's not an API problem.

andyross commented 5 years ago

(Actually no, k_sleep() won't have that behavior because obviously the calling thread won't be awake after a late interrupt until after all ticks have been announced and current time updated. I was too cute with the example. But in fact k_timer does work exactly like this, naively scheduling another iteration in N ticks every time the callback fires. It's the natural way to implement a clock, and it breaks if you try to use realtime as the base for scheduled callbacks.)

pabigot commented 5 years ago

I don't seem to be getting my thoughts across clearly.

The problem still exists: you have a callback that was supposed to have been run earlier but is actually being run late, and someone needs to make the decision as to which time to use for "six ticks later".

No, this is exactly what I'm saying should not be decided. I envision a system where I can do:

static int timeout_handler(struct timeout *to,
                           systime_t deadline,
               void *userdata)
{
   ssystime_t delta = k_systemtime() - deadline;
   printk("I was invoked %d time units after the deadline\n", delta);
   if (delta > INTERVAL) {
      // skip over a bunch of missed invocations
      _reset_timeout(to, deadline + NEXT_INTERVAL_AFTER(delta));
   }
   delta = k_systemtime() - deadline; // k_systemtime() can increase during the callback
   printk("I finished %d time units after the deadline\n", delta);
}

where that code waves its hands wildly in hopes it conveys the idea.

From what I see it's the coupling of clock increment with alarm processing in z_announce_clock() called from the system clock ISR that appears to make it difficult to provide a callback with both the (fixed) time at which it was supposed to be called and an API to read the (true, possibly changing) time that is "now".

Or maybe I'm wrong and that's not how it works.

With respect to z_announce_clock() as implemented today I agree the time that should be visible to a callback is the time the alarm was set for, rather than the time the clock will be set to when all callbacks are processed, for the reasons you described in the opening comment.

I never intended to imply that the API I proposed could work with the current implementation (although, looking at the example above, maybe it could...). What I propose is that by changing how the system clock is maintained, which is perhaps on the table in #12068, that API could become feasible. And I propose that API would eliminate the problem because even when a callback is late it has enough information to detect its lateness and handle it in the way it feels is necessary: i.e., it gets both times.

In any case I'm going to try to shut up now and let others express a view.

pizi-nordic commented 5 years ago

I would prefer an alarm (timeout) infrastructure based on deadlines as absolute system times.

I fully support that 👍

pizi-nordic commented 5 years ago

I am still not fully satisfied with current implementation. What I am afraid is this dticks correction:

if (first() != NULL) {
    first()->dticks -= announce_remaining;
}

Here is why:

If the adjusted timeout was on the list before call to z_clock_announce(), we have to do such correction (it is obvious).
If the adjusted timeout belongs to periodic k_timer which was handled during current z_clock_announce() execution, we should do such correction if we would like to keep current scheme of periodic timer handling (and I think, that in this case current implementation behaves correctly avoiding error accumulation).
If the adjusted timeout was added during current z_clock_announce() execution but it is not related to any periodic timer, we IMHO have to skip the adjustment. If we perform the adjustment, the timeout will fire earlier than expected, which must be avoided. As example of such situation, please consider timeout added by something asynchronous to current time (external interrupt?) which will be executed during z_clock_announce().

andyross commented 5 years ago

I'm not really sure I buy that logic. I mean, we know that k_timer relies on the behavior specified. Yet you're asserting that all other users... don't?

Remember again that there are no "normal" circumstances where this even matters. You only hit it if there is a delayed interrupt (or software bug) that causes the tick announcement to fall more than a tick behind where it should have been. So, sure, it may seem "unexpected", but everything is going to be unexpected when that happens.

As far as changing the API to use absolute time units, I'm not at all opposed. I just don't think it's going to have the benefits advertised; someone still needs to worry about this distinction. It also has the potential to start making more elaborate demands on the driver layer to support that kind of realtime counter, which is IMHO a bad place to be putting complexity. Not all hardware has such a realtime counter, they have funny interactions with idle, and driver code is just generally a rough place to be working. SysTick lacks a proper free running counter, and APIC's is somewhat loosely coupled to the TSC, stuff like that.

pabigot commented 5 years ago

We've been touching at least three distinct questions here.

1: Behavior when z_clock_announce is passed ticks > 1

Most recently summarized above as a potential reduction in the requested delay for a timer when something delays tick processing for more than one tick duration. I think you (@andyross) are acknowledging that this is undesirable behavior (aka a bug), but suggesting it only occurs in situations where the system is already unstable so we shouldn't worry about it:

everything is going to be unexpected when that happens.

I don't think that the fact the system is slightly unstable removes the responsibility of Zephyr to take low-cost steps to avoid destabilizing it further. If the cost of maintaining the requested interval correctly is "too high" then it shouldn't be applied, but we can't judge that until we see a working solution.

This should be recorded as a separate issue, since it's behavior that can be demonstrated in the current implementation.

2: Should alarm deadlines be relative or absolute

This shouldn't matter if it's done right. Which representation is more maintainable and usable depends on implementation specifics of how alarms are managed internally and configured externally. Let's move that elsewhere too, once there's context where it can be discussed (e.g. a proposal to support absolute deadlines).

3: "Which now is now" in an alarm callback

I believe this is what we're supposed to be addressing here.

I propose two general statements that should be interpreted without involving Zephyr or any other system:

[1] The time an alarm callback is invoked may be strictly later than the alarm's scheduled deadline due to latency and effects of other alarms being processed first.
[2] Fixed-interval alarms can be managed in two ways: (a) where the next deadline is a fixed offset from the previous deadline, and (b) where the next deadline is a fixed offset from the time the callback is invoked.

I hope there's no disagreement with either statement. If so, let's pause and clean that up first; I've labeled them so they can be discussed independently.

Moving on, much of the discussion here is because:

[3] Zephyr manages timeouts (alarms) in a way that makes it impossible to quantify the "lateness" of a callback using the clock that controls the alarm. This is because the "time the callback is invoked" comes from curr_ticks, which at the time an alarm callback is invoked is set to the scheduled deadline of the alarm regardless of the amount of processing that has occurred since the ISR was invoked.

Again, pause for consensus on this statement.

[4] Much of the discussion here arises from this alarm design and that k_timer supports only management scheme (a).

An observation, hopefully not controversial.

[5] I believe I have use for an alarm infrastructure that supports management scheme (b).

Here consensus doesn't matter: that's a subjective statement, true at a minimum for me, and I believe also for @pizi-nordic.

[6] Management scheme (b) is not supported in current Zephyr.

Follows from [3] and [4].

Summary

I don't claim the alarm management scheme (a), currently implemented, has no use.

I don't even claim that the alarm management scheme (a) should not be the default.

But any suggestion (intentional, inadvertent, or perceived) that we are mistaken in our belief that we have a use for the other management scheme doesn't contribute helpfully to the discussion.

This issue is appears to be based on an assumption we have to pick one management scheme or the other. I reject that assumption, and believe we need a proposal for an alarm infrastructure that supports both schemes as well as additional capabilities. At some point I will create such a proposal, assuming nobody does it first.

pizi-nordic commented 5 years ago

@pabigot: Nice summary 👍 @andyross: I will try to support my point of view using the following use-case:

Let assume that we have an external temperature sensor and we would like to measure the ambient temperature. The measurement will be taken every minute and slight delay is not a big problem for us, as we just want to display current temperature. The sensor needs 40ms to perform conversion. Temperature could be read after that time.

What is the most obvious method to implement that?

Create periodic timer_1, with the 1min period.
Create single-shoot timer_2 with 40ms timeout.
In handler of timer_1 expiration:
- Trigger conversion.
- Start timer_2.
In handler of timer_2 expiration:
- Fetch conversion result.
- Wakeup main thread in order to perform display update.

For me such solution looks good. Rule [1] guarantees, that we always fetch valid data from the sensor. For periodic timer, the delay introduced by [1] is not a problem as we know that ambient temperature will not change in dozens of us (we do not have real-time requirement here). Also, both periodic timer management scheme (a) and (b) are working for us (user will not notice accumulated error on display update).

Where is the problem: If the z_clock_announce() will be delayed, it will be called with ticks > 1. As result, the timer_2 timeout will be adjusted (it is first on the list after the call to timeout_1 expiration handler). Due to this adjustment, the timeout will be shorten (and if the z_clock_announce() was delayed more than 40ms, the handler of timer_2 will be called right after handler of timer_1). As result we will get undefined data from the sensor.

This is why I see the Behavior when z_clock_announce is passed ticks > 1 issue a critical one. Everything else is either feature request (different periodic timer handling schemes, passing information about now() to handlers, etc) or affects maintenance cost and resource usage (absolute/relative timeouts).

pabigot commented 5 years ago

@pabigot: Nice summary

Thanks.

This is why I see the Behavior when z_clock_announce is passed ticks > 1 issue a critical one.

I don't see an issue specifically for this, though it was also noted in #11722. Could you open one?

I agree it's important[*], but I think that discussion should be separate from this one.

[*]: I'm working with SHT21/HTU21D/Si7021 which has exactly the behavior you describe and drives several of my alarm-related expectations that Zephyr can't currently support.

pizi-nordic commented 5 years ago

I don't see an issue specifically for this, though it was also noted in #11722. Could you open one?

Here it is: https://github.com/zephyrproject-rtos/zephyr/issues/12332

andyross commented 5 years ago

Please stop submitting bugs guys. Submitting a bug is aggressive. It's an attempt to get my management to force me to do what you want (yes, we track bug counts and the people who pay me care whether or not I'm fixing ones I'm assigned), when we have genuine design disagreement here. Discuss, don't escalate.

It sounds like @pizi-nordic's complaint is addressable in the immediate term with an API to get the "timeout shortfall" or whatever. Or via a more elaborate API change per @pabigot that swaps the time units to absolute times (though again I warn there are edge cases and driver complexity issues with doing that). Or just by building without tickless enabled, which eliminates the ambguity (not the late interrupts, of course!). I'm more than happy to see code submitted along either lines.

But please stop going around my back, especially when I deliberately carved out a space for this argument and am genuinely trying to hear your opinions.

andyross commented 5 years ago

Closing this one. I think at this point everyone seems to agree that the issue I describe way above and the one in #12332 are, in fact, exactly the same problem. And in any case all current discussion is there and not here.