Improving resolution and usability of controlled delays in kernel APIs

pabigot commented 5 years ago

This issue outlines the current approach to controlled delays in Zephyr, identifies some weaknesses and functional gaps, and serves as a point of discussion of goals and requirements for changes to address those weaknesses.

Existing Approach

Several kernel system APIs allow an application to specify a delay before some operation is initiated or terminated. These APIs include (delay parameters are in bold type):

k_timer_start(timer, duration, period)
k_timer_remaining_get(timer) => remaining (nb: unsigned)
k_queue_getqueue, timeout) (underlying k_fifo_get, k_lifo_get)
k_futex_wait(futex, expected, timeout)
k_stack_pop(stack, data, timeout)
k_delayed_work_submit_to_queue(queue, work, delay)
k_delayed_work_remaining_get(work) => remaining (nb: signed)
k_mutex_lock(mutex, timeout)
k_sem_take(sem, timeout)
k_{msgq,mbox,pipe}_[block_]{put,get}(store, data, timeout)
k_mem_{slab,pool}_alloc(slab, mem, timeout)
k_poll(events, num, timeout)
k_sleep(ms)
k_usleep(us)
k_busy_wait(us)
k_thread_deadline_set(thread, deadline)
k_uptime_get() and k_uptime_delta(&ms)

In current Zephyr most controlled delays are specified as a signed 32-bit integer counting milliseconds, with helper macros like K_MINUTES(d) to convert from more coarse-grained measures. Exceptions are:

k_usleep and k_busywait which operate in microseconds;
k_thread_deadline_set which operates in cycles of the hardware clock;
k_uptime_get which operates in s64_t milliseconds clamped to tick increments.

Internally most delays are implemented through struct _timeout which operates on ticks as defined by SYS_CLOCK_TICKS_PER_SEC. The requested delay is converted to the smallest span of ticks that is not less than the requested delay, except that a duration of zero may be converted to a single tick in some cases. An exception is k_busy_wait under the influence of ARCH_HAS_CUSTOM_BUSY_WAIT, currently used in-tree only by Nordic.

Functional Gaps

For all APIs but specifically k_timer interrupts between the point the application supplies the relative delay and the point the timer infrastructure inserts it into a processing queue introduces complexity in precise delay maintenance. To reduce these complexities it is desirable in some cases to specify delays as deadlines rather than relative offsets (#2811).

Use of milliseconds was tolerable as the tick duration has historically been 10 ms. With the upcoming merge of #16782 decreasing tick duration to 100 us finer grained specification will soon be needed for most if not all APIs. Arguments have been made to go as fine as nanoseconds (#6498).

Base Requirements and Questions

In a recent telecon @andyross proposed addressing these gaps by changing the way delays are specified, from signed 32-bit millisecond counts to another representation.

The following are positions and questions which I (@pabigot) have summarized and extended based on previous related discussions and experience. All are open for debate.

Existing code like k_sem_take(&sem, K_MSEC(5))--code that uses helper macros to translate delay durations to a timeout value--must be unaffected by any underlying API changes.
It must be possible to specify timer delays as either relative or absolute delays.
It is TBD whether other delays such as k_poll or k_sleep should allow for absolute deadlines.
It is desirable for the application to be able to retrieve the absolute deadline assigned by the infrastructure when given a relative delay. Note that returning time remaining does not satisfy this desire.
Precise handling of the order of completion for delays with the same deadline should be defined. The current implementation is inconsistent when timeouts are scheduled during callbacks or with deadlines that have passed (#12332). The problem will be exacerbated with the ability to schedule at absolute deadlines (which may have passed).
We should revisit the semantics of passing a relative deadline that has a non-positive value: in some cases converting it to an absolute deadline in the past may be preferable to the existing practice of treating it the same as zero in all cases.
We need to determine the finest resolution that must be supported. Microseconds would handle many cases, but going to nanoseconds may be worth doing for future-proofing and because it has been requested in the past.
We need to determine the maximum relative delay that must be expressible. With s32_t milliseconds that's currently 2147483.647 s (about 3.5 weeks). As resolution is increased this delay is reduced significantly unless larger data types are used.
We need to review the clock domain used to control these delays, in particular its resolution and span. Currently this is the 64-bit system tick clock. At this time the only public API to read this clock is k_uptime_get* which converts its scale to milliseconds. New API should be added to access the full precision for use in maintaining absolute deadlines.
We might consider using POSIX clock_gettime() as a model for reading clocks, and whether there's value in specifying delays as being expressed for a specific clock.

pabigot commented 5 years ago

cc @andyross @nashif @andrewboie @MaureenHelm @carlescufi @galak @pizi-nordic

pizi-nordic commented 5 years ago

One more thing to consider: We should specify clock domains for k_busy_wait() and k_cycleget32(). and other k* API. Currently we assume that all are clocked from the same source, but for at least Nordic, this is not true.

leapslabs commented 5 years ago

Thanks Peter for the detailed summary!

I think the requirements contain all the things which I would need for an IoT application with precise TDMA scheme. I don't have a deep understanding of Zephyr so here are just a few comments from my side:

Resolution - I cannot see yet at the moment the use of nanoseconds resolution. When devices need to synchronize at a such level it would require probably to use a different approach. Compensation techniques for clock drift, temperature, vibration etc. would need to take in account. I am not sure if the overhead and efforts would worth it. It would be also difficult to evaluate and maintain that across platforms.
Absolute time in the past - I think the user should take care of not setting the absolute time in the past.
Delay range - For an IoT application with micro-seconds TDMA a practical update rate would be maximum in tens of minutes range. A longer period is difficult to achieve taking into account the hardware clock sources used on those devices, temperature compensation, etc. So using s32_t or u32_t is probably still fine for most of applications. For longer delays the accuracy of micro-seconds is probably difficult to achieve on medium/low cost devices. For devices at higher cost or where power consumption is not a constrain there are more options to achieve timing accuracy and that can be done outside of the kernel.

pabigot commented 5 years ago

@leapslabs Thanks for your comments. Although synchronization is important, the primary use of this feature is internal to a single device. System cycle clocks are always above 1 MHz so nanoseconds are necessary to measure deltas. There are also use cases for setting alarms that will fire months in the future (e.g. preparing for a time zone offset or TAI-UTC delta change).

My tests on boards from four different vendors show the system clock is generally accurate to about 25 ppm, which aggregates 1 ms error in 40 s runtime, or 1 s in about eleven hours, which is consistent with your 10s-of-minutes estimate to coordinate time between devices at the application level, e.g. to synchronize scheduled operations like turning lights on or off.

pizi-nordic commented 5 years ago

Thank you for listing all the problems in single place!

Existing code like k_sleep(K_MSEC(5))--code that uses helper macros to translate delay durations to a known unit--must be unaffected by any underlying API changes.

It must be possible to specify timer delays as either relative or absolute delays.

It is TBD whether other delays such as k_poll or k_sleep should allow for absolute deadlines.

It is desirable for the application to be able to retrieve the absolute deadline assigned by the infrastructure when given a relative delay. Note that returning time remaining does not satisfy this desire.

I like both the idea of specifying absolute deadlines as well as fetching deadlines from any kind of kernel API. Looking to @andyross PR the following idea popped in my mind:

 /* Relative */
k_delayed_wrok_submit(..., K_MSEC(10));

/* Absolute */
k_delayed_wrok_submit(..., K_ABS(K_MSEC(10)));

/* Full controll */
struct k_timeout tmo;
k_timeout_set(&tmo, K_MSEC(10)); /* Relative */
k_delayed_wrok_submit(..., K_TMO(&tmo)));
x = k_timeout_remaining_get(&tmo);
x = k_timeout_absolute_get(&tmo);

Question if this idea is really usable is open, but IMHO this might be a step closer to satisfy @pabigot requirements. Moreover, if we are going to change kernel API, there is no better moment than the upcoming Zephyr 2.0 release. Even if not all new features will be available, we might start familiarize users with new API.

Precise handling of the order of completion for delays with the same deadline should be defined. The current implementation is inconsistent when timeouts are scheduled during callbacks or with deadlines that have passed (#12332). The problem will be exacerbated with the ability to schedule at absolute deadlines (which may have passed).

This needs discussion. We have to set clear boundaries what will be interpreted as a past, both in absolute and relative API.

We should revisit the semantics of passing a relative deadline that has a non-positive value: in some cases converting it to an absolute deadline in the past may be preferable to the existing practice of treating it the same as zero in all cases.

IMHO we should not allow for negative deltas from now() in the API. If the user would use "schedule in past" approach we should force him to use absolute timeouts and do the math in application.

We need to determine the finest resolution that must be supported. Microseconds would handle many cases, but going to nanoseconds may be worth doing for future-proofing and because it has been requested in the past.

We need to determine the maximum relative delay that must be expressible. With s32_t milliseconds that's currently 2147483.647 s (about 3.5 weeks). As resolution is increased this delay is reduced significantly unless larger data types are used.

That is hard part. I can see the same device needing sub-us precision (for example to implement a radio protocol) and long timeouts (for example run self-check of the controlled machinery each 3 months, remind user about service once per year, etc.)

I support @pabigot in one more area: we should clearly define clock domains for the kernel API: For example for Nordic k_cycle_get 32() API is using different clock than k_busy_wait() which is not expected by some of the tests.

nordic-krch commented 5 years ago

I can see the same device needing sub-us precision (for example to implement a radio protocol) and long timeouts

long timeouts can always be achieved in application in fragments so i'm not that worried about that. Regarding sub-us if we are talking about kernel infrastructure then if api's are using ticks (and macros for ms2ticks, us2ticks conversion) then nanoseconds are not a problem if nanoseconds-to-ticks macro is added. For lower frequencies that will of course make no sense.

pizi-nordic commented 5 years ago

@nordic-krch: You reminded me about one important problem: The ms2tick and tick2ms conversion is not symmetric (tick2ms(ms2tick(x)) != x) whch causes a lot of troubles in the tests as we have to take under account conversions made in the kernel. IMHO the Zephyr user will face similar problems, so we should at least consider making such conversion symmetric.

pabigot commented 5 years ago

The ms2tick and tick2ms conversion is not symmetric (tick2ms(ms2tick(x)) != x)

Conversion between clock domains is not necessarily both injective (one-to-one) and surjective (onto), which is required to create a total inverse function. I don't think this expectation can be satisfied, especially since these functions don't specify the direction of rounding, and if they both round up the composition might be a monotonically increasing function.

pizi-nordic commented 5 years ago

Conversion between clock domains is not necessarily both injective (one-to-one) and surjective (onto), which is required to create a total inverse function. I don't think this expectation can be satisfied, especially these functions don't specify the direction of rounding, and if they both round up the composition might be a monotonically increasing function.

Ahhh, I missing my math courses :). But you are right. We are working in finite field and we cannot avoid rounding, so there will be no total inversion.

However at the moment, the ms2tick rounds up while tick2ms rounds down, which results in ms2tick(tick2ms(ms2tick(x))) != ms2tick(x). Do you think that this could be fixed?

pabigot commented 5 years ago

However at the moment, the ms2tick rounds up while tick2ms rounds down, which results in ms2tick(tick2ms(ms2tick(x))) != ms2tick(x). Do you think that this could be fixed?

Technically, yes; without changing behavior, no. For the purposes of this PR what I want is something that in its fundamental form could be:

/** Convert a duration from one clock domain to another in which the
 * result duration will be the smallest representable duration that is
 * not shorter than the source duration. */
duration_type clock_duration_convert_upper(duration_type from,
                                        clock_domain_type from_domain,
                                        clock_domain_type to_domain);
/** Convert a duration from one clock domain to another in which the
 * result duration will be the largest representable duration that is
 * not longer than the source. */
duration_type clock_duration_convert_lower(duration_type from,
                                        clock_domain_type from_domain,
                                        clock_domain_type to_domain);

But we're bikeshedding solutions; the intent here is to discuss requirements. So the requirement the above satisfies is the ability to convert durations between clock domains while controlling the rounding direction.

pizi-nordic commented 5 years ago

So the requirement the above satisfies is the ability to convert durations between clock domains while controlling the rounding direction.

And adding this to the list was my goal. Thank you.

pabigot commented 5 years ago

We have to use github for design discussions as there isn't anything else available (slack ruled out by TSC), but its single-threaded nature is really inconvenient. So in a couple days what I'd like to do is take into account all comments received so far and post a new version of the Base Requirements and Questions section, including links to relevant comments. Then hide the preceding comments so people can see what the latest consensus looks like without getting bogged down in the supporting discussion.

It'll be ugly. I don't have a better solution.

pabigot commented 5 years ago

Experience in #17155 shows another point of discussion: Threads, which are initialized at both compile-time and runtime, include an init_delay attribute which is currently s32_t in memory. Threads need to be constructed by k_thread_create() at run time, and K_THREAD_DEFINE() at compile time. Encoded timeouts as suggested in that PR cannot be created at compile-time, while using integer milliseconds at runtime would be inconsistent with other timeout parameters.

It may be necessary to make an exception to use milliseconds for k_thread delays as well.

andyross commented 5 years ago

It was discussed a while back that that argument to k_thread_create() is probably not a good idea to have in the API and that a better scheme would be to simply have a "initialize but do not start" flag, allowing the user to do it at an appropriate time via whatever mechanism she likes.

eHammarstrom commented 2 years ago

Just wanted to point out that given the documentation of k_thread_start one would believe that K_THREAD_DEFINE(..., K_FOREVER); would work; which as pointed out previously it does not. But the type int32_t of init_delay leads one to believe that K_THREAD_DEFINE(..., K_TICKS_FOREVER); might work. But the internal thread struct does not account for CONFIG_TIMEOUT_64BIT, which may result in trying to fit 64-bits of ticks into the 32-bits delay.

It looks like the current way of starting a thread yourself is creating a stack using K_STACK_DEFINE and a local thread struct k_thread and then using the k_create_thread API.

ghost commented 1 year ago

@cfriedt @nordic-krch One more rather old clock-related ticket which now feels more like an RFC than an enhancement request.

@nashif RFC tag + backlog?

Maybe it would be adequate to collect the remaining open RFCs/Enhancement requests in one RFC (I'd propose #40099 as it already is a meta RFC and has the best link collection AFAICS). For someone like trying to understand for the first time what proposals are being made, this would be very handy. Think this would be a good time as the RFC API has just been introduced and @cfriedt is doing all that POSIX work which is related.

Especially in the clock/timing area the number of open stale proposals, abandoned PRs, etc. is really impressive. ;-)

cfriedt commented 1 year ago

Maybe it would be adequate to collect the remaining open RFCs/Enhancement requests in one RFC

It's being pulled in to the LTSv3 POSIX roadmap / RFC, which does include a lot of non-posix work, ironically.

zephyrbot commented 9 months ago

Hi @peter-mitsis, @andyross,

This issue, marked as an Enhancement, was opened a while ago and did not get any traction. It was just assigned to you based on the labels. If you don't consider yourself the right person to address this issue, please re-assing it to the right person.

Please take a moment to review if the issue is still relevant to the project. If it is, please provide feedback and direction on how to move forward. If it is not, has already been addressed, is a duplicate, or is no longer relevant, please close it with a short comment explaining the reason.

@pabigot you are also encouraged to help moving this issue forward by providing additional information and confirming this request/issue is still relevant to you.

Thanks!

andyross commented 9 months ago

Pretty sure the subsystem rewrite which this was opened to track and evangelize got abandoned. Let's close it as stale; certainly there's always going to be room for discussing improvements to core subsystems, but this one seems to have run its course. We can always reference it later.

zephyrproject-rtos / zephyr