Open cbiffle opened 9 months ago
Hm, on reflection, it also occurs to me that we could potentially have the closed receive operation itself drive the priority boosting.
With the exception of certain dynamic tasks used for debugging, like hiffy's generic send support and udprpc, we can identify all potential clients of a service at compile time. So, we have the opportunity to do a more precise priority boost. That's nice.
I suppose we'd need some way to actually plumb the priority information, which is known when building the dist image, into the ... whoever ends up being responsible for boosting priority ... though?
Aye, that we would.
The simplest thing I can think of would be to (1) have the boosting triggered by something the server does, and (2) boost to either that server's priority, or that server's priority minus one. Probably the latter.
We'd need to think about whether this needs to be transitive, since servers are typically also clients, with rare exception. Generally one of the advantages of PCP over PI is that you don't need to implement transitive graph algorithms. We'd want to make sure that holds in whatever we do.
So if, say,
...I think it works out? Should probably sit down with some of the PCP papers to learn why I'm obviously wrong. :-)
In terms of the original questions I posted, the "closed receive as priority donation" approach gets us:
How do we know when to boost a task's priority? We boost a task x
if, and only if, at least one (other) task is in state InRecv(Some(x))
.
How do we know when to stop boosting the priority? We unboost the task immediately upon the waiting task resuming. This will be either in response to a message from the boosted client (implying that the client is now blocked, and so its priority doesn't matter), or a notification, which allows the server to implement a timeout or service an interrupt and then decide whether to re-boost the client by re-entering a closed receive, or to open receive instead.
How do we know what priority to boost to? We can directly compute it from the base (unboosted) priority of the task executing the closed-receive operation.
Who keeps track of which tasks are boosted? The kernel; it is precisely the set of task IDs x
that appear in states matching the pattern InRecv(Some(x))
. If a task leaves that state for any reason (successful receive, notification, or restart by supervisor/debugger) we unboost the referenced task immediately.
Personal development note, I have gotten a lot better at spelling the word recieve. Erm, receive.
As of #1762, there's another potentially nice property to closed RECV acting as the "boost priority" operation:
It can be interrupted. Specifically, a server in a closed RECV can specify a non-zero notification mask to allow some subset of events, such as timers, to break it out of the closed RECV. When this happens, the boosted client would need to atomically lose the increased priority. It would regain it if, and only if, the server re-enters the closed RECV.
This lets a server cancel a client's boosted priority. It doesn't give the server any way to notify the client that this has occurred, however -- in particular, the server cannot reply-fault the client until the client eventually calls back and assumes that it has exclusive access still. This possibility already existed before #1762: a server could be restarted by the supervisor or a debugger while in a closed RECV, so that when the client eventually calls back, it no longer has exclusive access.
Background/problem
We have a (known) priority inversion opportunity in the OS. It has to do with our use of closed receive to implement mutual exclusion, and is most easily demonstrated by considering the SPI task:
I knew this was going to be a risk, so some core features of Hubris are designed to make it relatively easy to mitigate -- but we ain't done it yet, and we prolly oughtta. Hence this bug report.
FWIW, we haven't seen any bugs caused by this in practice, likely because we don't use mutual exclusion patterns very much. To make this happen, I'm pretty sure you have to be using "closed receive." Closed receive is the
sys_recv
mode that only listens for a single task, instead of the highest priority queued sender. It's how we implement mutual exclusion on the SPI driver.Proposed fix, in the abstract
I suspect the easiest way to fix this would be by using Priority Ceiling Protocol or a derivative of it. Priority Inheritance is a popular way to fix this in realtime operating systems, but ironically priority inheritance has some features that make it hard to implement in constant time: to make unblocking an arbitrary waiter cheap, you need complex minheap structures (and thus dynamic allocation); otherwise it winds up approaching linear. While we're not as aggressively realtime as some systems, I'd sure like to avoid building load-sensitive operations into the kernel, since we've largely avoided it until now.
An application of priority ceiling protocol for the SPI mutual exclusion case would change the original scenario as follows:
PCP comes in several variations, which I alluded to above. There's the axis of "when to boost:"
lock
. This is simple and easy to reason about, but it can also starve intermediate-priority tasks that have nothing to do with the resource.And then there's the axis of "how to boost:"
Notes on potential implementation and open design questions
With the exception of certain dynamic tasks used for debugging, like hiffy's generic send support and udprpc, we can identify all potential clients of a service at compile time. So, we have the opportunity to do a more precise priority boost. That's nice.
To do this, there's some basic implementation work to do, but also some design work. First, the implementation work: we'd need to start adjusting task priorities. This is easy because I did most of it four years ago:
Now, the design work:
lock
operation -- it's a normal IPC, nothing about it indicates that it needs special handling. It's also the only operation on the SPI IPC interface that should be handled this way. (My guess is that we'll want a variation on the REPLY primitive.)...and probably other questions.