Open pabigot opened 5 years ago
thread-safe: (TBD) a function is thread-safe if its behavior is correct when invoked from multiple threads at the same time
reentrant: (TBD) a function is reentrant if its behavior is correct when it is invoked by (indirect) recursion from the same thread
+1 to these.
- interrupt-safe: (TBD) a function is interrupt-safe if its behavior is not affected by concurrent access from interrupts
No objections, but can you provide an example API function where you might want to use this term? I would expect all the driver APIs I can think of offhand to be interrupt-safe, at least when called from thread context.
* block: (TBD) a call blocks if it can suspend the invoking thread while it waits for something to happen
I would suggest "might sleep" (or "may sleep") instead.
This is more familiar from Linux kernel source code (648 matches for "might sleep" in v5.3), and has additional advantages beyond that.
In particular, as discussed for clock_control_on()
at today's API meeting, some people (including me) thought "block" might simply mean "don't return until the operation is complete". This would include functions which busy-wait until the job is done (like k_spin_lock()
might want to do on SMP).
By contrast, "might sleep" makes it clearer that the caller's thread might get scheduled out, and thus further hints the caller must be running in thread context.
At the individual function level we should clearly document:
Can a call block?
So this would become e.g.:
/**
* @brief Enable a clock controlled by the device
*
* On success, the clock is enabled and ready when this function
* returns.
*
* This function :ref:`might-sleep`.
*
* @param dev Device structure whose driver controls the clock.
* @param sys Opaque data representing the clock.
* @return 0 on success, negative errno on failure.
*/
static inline int clock_control_on(struct device *dev,
clock_control_subsys_t sys)
Where the Sphinx might-sleep
ref would link directly to a common definition for the term.
Also are k_is_in_isr() and k_is_preempt_thread() sufficient to determine whether it's safe to proceed, or do we also need k_is_atomic(), #17301, or other technologies?
+1 to k_is_atomic()
instead of k_is_in_isr()
.
+1 to
k_is_atomic()
instead ofk_is_in_isr()
.
Never mind, I take it back. https://lwn.net/Articles/274695/
Once these mistakes are cleared up, there is still the question of just how kernel code should decide whether it is running in an atomic context or not. The real answer is that it just can't do that. Quoting Andrew Morton again:
The consistent pattern we use in the kernel is that callers keep track of whether they are running in a schedulable context and, if necessary, they will inform callees about that. Callees don't work it out for themselves.
- interrupt-safe: (TBD) a function is interrupt-safe if its behavior is not affected by concurrent access from interrupts
No objections, but can you provide an example API function where you might want to use this term? I would expect all the driver APIs I can think of offhand to be interrupt-safe, at least when called from thread context.
We need to be able to say succinctly "Unless otherwise specified all API functions are interrupt-safe" and expect people to know what that means. A specific example would be the GPIO API, where it is not clear to implementers that, because GPIO write functions may be invoked from ISRs, read-modify-write code like:
u32_t out = gpio->OUT;
gpio->OUT ^= (out & ~mask) | (value & mask);
must be wrapped in a spin-lock to be interrupt-safe. Most implementations don't do that, even in the ongoing GPIO rewrite, in part because we couldn't get agreement in #19252 on what "interrupt-safe" meant and whether it was important to document that expectation before people started implementing drivers.
* block: (TBD) a call blocks if it can suspend the invoking thread while it waits for something to happen
I would suggest "might sleep" (or "may sleep") instead. This is more familiar from Linux kernel source code (648 matches for "might sleep" in v5.3), and has additional advantages beyond that.
In particular, as discussed for
clock_control_on()
at today's API meeting, some people (including me) thought "block" might simply mean "don't return until the operation is complete". This would include functions which busy-wait until the job is done (likek_spin_lock()
might want to do on SMP).
My intent was to identify functions that will not transfer control from a cooperative thread. "blocking" is a poorly chosen term. Scrap it and start over.
Zephyr uses several terms including "Waiting", "Suspended", and "Ready" to identify started non-terminated non-running threads. The transitions we're interested in are "suspend" and "wait". Let's use "yield" for "suspends or waits".
There are three cases:
So we can do:
yields (thread): A thread yields when it invokes an operation that transitions it to Suspended or Waiting.
pre-empted: A thread is pre-empted when it transitions to Ready without having yielded.
yielding (function): A function is yielding when it invokes (directly or indirectly) an operation that causes the executing thread to yield.
non-yielding (function): A function is non-yielding when it is not yielding.
It's non-yielding that we care about, because those functions guarantee, when invoked by a cooperative thread, that no other cooperative thread will be allowed to run before the function returns. (Absent bugs like #20255.)
I'm not sure exactly how functions should be annotated with respect to their behavior, but I would want it to be terse and consistent.
+1 to
k_is_atomic()
instead ofk_is_in_isr()
.Never mind, I take it back. https://lwn.net/Articles/274695/
Yes, but that relies on having API to provide information about context to callees, which we probably don't have. To be considered later.
A specific example would be the GPIO API
Excellent example, thank you.
Summarizing the status:
k_is_atomic()
: no, not yet (and to clarify: in my mind, not ever, being convinced by the LWN article)There are three cases:
* The thread invoking a given function will not yield; * The thread invoking a given function may yield; * The thread invoking a given function will yield.
Did you mean "The function invoked by a given thread" instead of "The thread invoking a given function"?
- yielding (function): A function is yielding when it invokes (directly or indirectly) an operation that causes the executing thread to yield.
- non-yielding (function): A function is non-yielding when it is not yielding.
With the given definition of "yield" ("suspends or waits"), I agree these are the desired semantics.
I also agree that defining terms in accordance with documented thread states and related APIs is a great approach.
However, I'm still bullish on "might sleep" terminology-wise. I worry "yielding" is too easy to confuse with "may call k_yield()".
I've integrated the discussion so far into this document which is available for review through #20959 so people can comment on specific concerns in context. I think that'll be a bit more convenient then chained comments on this issue.
There are three cases:
* The thread invoking a given function will not yield; * The thread invoking a given function may yield; * The thread invoking a given function will yield.
Did you mean "The function invoked by a given thread" instead of "The thread invoking a given function"?
No. The operation of yielding is something that directly affects threads. The impact on functions is derived.
However, I'm still bullish on "might sleep" terminology-wise. I worry "yielding" is too easy to confuse with "may call k_yield()".
A problem is that traditionally Zephyr has used "sleep" to specifically mean "wait for a timeout to occur". I would rather that "suspend" was more general, but currently it specifically references waiting without an event.
API meeting 19th May:
Plan of action to document the expected behavior of public API functions in Zephyr:
k_pipe
and k_mutex
(TBD). @pabigot volunteers to do this and then have it reviewed by all. During the 2.4 development cycle, begin each API meeting with a status update on the function properties.
Here's my assessment of k_mutex
and k_pipe
:
function | syscall | reschedule | sleep | no-wait | isr-ok | pre-kernel-ok | async | supervisor |
---|---|---|---|---|---|---|---|---|
k_mutex_init | YES | no | no | yes | yes | |||
k_mutex_lock | YES | yes | YES | YES | NO* | |||
k_mutex_unlock | YES | YES | no | yes | NO* | |||
k_pipe_init | no | no | no | yes | yes | |||
k_pipe_cleanup | no | no | no | yes | yes | |||
k_pipe_alloc_init | YES | no | no | yes | yes | |||
k_pipe_put | YES | yes | YES | YES | yes | |||
k_pipe_get | YES | yes | YES | YES | yes | |||
k_pipe_block_put | no | yes | YES | no | no | YES | ||
k_pipe_read_avail | YES | yes | YES | no | no | |||
k_pipe_write_avail | YES | yes | YES | no | no |
Some notes on the table
reschedule
is yes
if sleep
is YES
; it is YES
only in some cases where sleep
is no
sleep
defaults to no
. Whether suspending in a function like k_sem_give()
that is reschedule
is something that should be sleep
is open for clarification.no-wait
is yes
if not sleep
. It may be YES
or no
if sleep
isr-ok
is yes
if either no-wait
or not sleep
. The two NO
are an inconsistency that may be removed via #25678async
is rare and only one of these functions has it.syscall
is just informational, not one of the standard function attributespre-kernel-ok
which is probably generally equivalent to isr-ok
, but deep inspection is required to ensure it doesn't do something that isn't available until the main thread has started.supervisor
which also requires inspection. I assume anything that's syscall
is not supervisor
, but the others may or may not be.Determining the behavior of the API is not trivial. Some of the above may be wrong. Some were surprises, like the functions to read the number of spaces available for read and write in k_pipe
, which can block for an unbounded period due to an internal call to an allocation function implemented as k_stack_pop()
with a K_FOREVER
wait.
I've identified 181 functions in kernel.h
that could be categorized. There are some interesting inconsistencies where some functions are syscalls but other seemingly parallel ones are not. Some of the convenience wrappers like k_fifo
and k_lifo
are inconsistently implemented.
This is gonna be really unpleasant.
@andyross and @andrewboie could you please review the contents of this table? The terminology is further explained here: https://docs.zephyrproject.org/latest/reference/terminology.html
@pabigot was reading through all of this and was wondering why we use "sleep" (https://docs.zephyrproject.org/latest/reference/terminology.html#sleep). Sleeping is not a documented thread state. Is this the same as _THREAD_PENDING
where a thread is waiting on an object?
@pabigot was reading through all of this and was wondering why we use "sleep" (docs.zephyrproject.org/latest/reference/terminology.html#sleep). Sleeping is not a documented thread state. Is this the same as
_THREAD_PENDING
where a thread is waiting on an object?
AFAIK yes
The terminology was introduced in #21678 based on analysis in #20969. The term "sleep" ultimately came from here.
To precisely describe what "put the caller to sleep" means to a user it became "a voluntary transition to a non-running active thread state".
The documented thread states relevant to "sleep" are transition to either "waiting" and "suspended". Technically it would also cover "ready" but AFAIK there is no API that allows voluntary direct transition from running to ready.
So the terminology does not reflect any internal kernel state names that implement the thread state machine. It's perhaps not too late to change, but I suspect "sleeps" will be more clear to people than "pends".
Shorter: "sleeps" was intended to also describe a function that invokes k_thread_suspend(_current);
which would instead be _THREAD_SUSPENDED
.
This issue is intended to cover the concerns raised in #1960, #3694, #6184, #17014, and related lack of clarity and consistency.
Proposal: We should have:
Also are
k_is_in_isr()
andk_is_preempt_thread()
sufficient to determine whether it's safe to proceed, or do we also needk_is_atomic()
, #17301, or other technologies?Example policies might be:
-EWOULDBLOCK
shall be returned in situations where the requested operation cannot be successfully completed in that context.If we can define the normal behavior at either a system or driver-specific level, only the exceptions need be documented at the individual function level.