oxidecomputer / hubris

A lightweight, memory-protected, message-passing kernel for deeply embedded systems.
Mozilla Public License 2.0
2.93k stars 167 forks source link

Potential priority inversion in closed receive protocols #1654

Open cbiffle opened 5 months ago

cbiffle commented 5 months ago

Background/problem

We have a (known) priority inversion opportunity in the OS. It has to do with our use of closed receive to implement mutual exclusion, and is most easily demonstrated by considering the SPI task:

  1. Unimportant task contacts SPI driver and requests to lock it. It succeeds.
  2. Important task tries to contact SPI driver, is queued. Now we have important work waiting on unimportant work, which isn't great but is hard to avoid.
  3. Medium-important task loses its mind and doesn't yield CPU to the unimportant task. Now the medium-important task is able to prevent the important task from making progress, which was not the design intent (and is also bad).

I knew this was going to be a risk, so some core features of Hubris are designed to make it relatively easy to mitigate -- but we ain't done it yet, and we prolly oughtta. Hence this bug report.

FWIW, we haven't seen any bugs caused by this in practice, likely because we don't use mutual exclusion patterns very much. To make this happen, I'm pretty sure you have to be using "closed receive." Closed receive is the sys_recv mode that only listens for a single task, instead of the highest priority queued sender. It's how we implement mutual exclusion on the SPI driver.

Proposed fix, in the abstract

I suspect the easiest way to fix this would be by using Priority Ceiling Protocol or a derivative of it. Priority Inheritance is a popular way to fix this in realtime operating systems, but ironically priority inheritance has some features that make it hard to implement in constant time: to make unblocking an arbitrary waiter cheap, you need complex minheap structures (and thus dynamic allocation); otherwise it winds up approaching linear. While we're not as aggressively realtime as some systems, I'd sure like to avoid building load-sensitive operations into the kernel, since we've largely avoided it until now.

An application of priority ceiling protocol for the SPI mutual exclusion case would change the original scenario as follows:

  1. Unimportant task contacts SPI driver and requests to lock it. It succeeds.
  2. Important task tries to contact SPI driver, is queued. Now we have important work waiting on unimportant work, which isn't great but is hard to avoid.
  3. At or before this point, the priority of the unimportant task is temporarily boosted to be at least as high as all clients of the SPI task. (Variations described below.)
  4. Medium-important task does not have an opportunity to be scheduled because of the priority boost and can no longer interfere.
  5. Previously-unimportant-but-now-boosted task finishes its work and yields to the important task.

PCP comes in several variations, which I alluded to above. There's the axis of "when to boost:"

And then there's the axis of "how to boost:"

Notes on potential implementation and open design questions

With the exception of certain dynamic tasks used for debugging, like hiffy's generic send support and udprpc, we can identify all potential clients of a service at compile time. So, we have the opportunity to do a more precise priority boost. That's nice.

To do this, there's some basic implementation work to do, but also some design work. First, the implementation work: we'd need to start adjusting task priorities. This is easy because I did most of it four years ago:

Now, the design work:

...and probably other questions.

cbiffle commented 5 months ago

Hm, on reflection, it also occurs to me that we could potentially have the closed receive operation itself drive the priority boosting.

hawkw commented 5 months ago

With the exception of certain dynamic tasks used for debugging, like hiffy's generic send support and udprpc, we can identify all potential clients of a service at compile time. So, we have the opportunity to do a more precise priority boost. That's nice.

I suppose we'd need some way to actually plumb the priority information, which is known when building the dist image, into the ... whoever ends up being responsible for boosting priority ... though?

cbiffle commented 5 months ago

Aye, that we would.

The simplest thing I can think of would be to (1) have the boosting triggered by something the server does, and (2) boost to either that server's priority, or that server's priority minus one. Probably the latter.

We'd need to think about whether this needs to be transitive, since servers are typically also clients, with rare exception. Generally one of the advantages of PCP over PI is that you don't need to implement transitive graph algorithms. We'd want to make sure that holds in whatever we do.

So if, say,

  1. The clients of a server are required to have priority lower than the server's base (unboosted) priority, even if the server is boosted for some reason
  2. A client is boosted iff the server enters closed receive against that client, making closed receive a sort of "priority donation" operation.
  3. The boost is always to the server's base priority minus 1, so that the boosted client is at least as high priority as any other client of the server.

...I think it works out? Should probably sit down with some of the PCP papers to learn why I'm obviously wrong. :-)

cbiffle commented 5 months ago

In terms of the original questions I posted, the "closed receive as priority donation" approach gets us:

How do we know when to boost a task's priority? We boost a task x if, and only if, at least one (other) task is in state InRecv(Some(x)).

How do we know when to stop boosting the priority? We unboost the task immediately upon the waiting task resuming. This will be either in response to a message from the boosted client (implying that the client is now blocked, and so its priority doesn't matter), or a notification, which allows the server to implement a timeout or service an interrupt and then decide whether to re-boost the client by re-entering a closed receive, or to open receive instead.

How do we know what priority to boost to? We can directly compute it from the base (unboosted) priority of the task executing the closed-receive operation.

Who keeps track of which tasks are boosted? The kernel; it is precisely the set of task IDs x that appear in states matching the pattern InRecv(Some(x)). If a task leaves that state for any reason (successful receive, notification, or restart by supervisor/debugger) we unboost the referenced task immediately.

Personal development note, I have gotten a lot better at spelling the word recieve. Erm, receive.

cbiffle commented 4 months ago

As of #1762, there's another potentially nice property to closed RECV acting as the "boost priority" operation:

It can be interrupted. Specifically, a server in a closed RECV can specify a non-zero notification mask to allow some subset of events, such as timers, to break it out of the closed RECV. When this happens, the boosted client would need to atomically lose the increased priority. It would regain it if, and only if, the server re-enters the closed RECV.

This lets a server cancel a client's boosted priority. It doesn't give the server any way to notify the client that this has occurred, however -- in particular, the server cannot reply-fault the client until the client eventually calls back and assumes that it has exclusive access still. This possibility already existed before #1762: a server could be restarted by the supervisor or a debugger while in a closed RECV, so that when the client eventually calls back, it no longer has exclusive access.