oxidecomputer / hubris

A lightweight, memory-protected, message-passing kernel for deeply embedded systems.
Mozilla Public License 2.0
3.02k stars 175 forks source link

net: on startup, treat all tasks as waiting_to_send to avoid liveness bug #1844

Closed cbiffle closed 3 months ago

cbiffle commented 3 months ago

We currently have a potential (albeit unlikely) liveness bug in the net stack task protocol. It goes like this:

  1. A task attempts to send a packet.
  2. The net stack reports that the socket's outgoing queue is full and the task should try again later.
  3. The task begins waiting for the queue-available notification.
  4. The net stack crashes and restarts, losing its table of which tasks are owed a notification.
  5. It sees all queues as having available space and goes on about its business.

The task will only be woken up when it next receives a packet.

After discussion with @mkeeter I think the simplest way of addressing this is to change the protocol slightly. Specifically:

On net restart, all tasks should be treated as waiting_to_send.

This will cause net to distribute a notification to all sockets on restart. Since we treat notifications as potentially spurious, the worst that this will do is cause the clients to wake up briefly and do unnecessary checking before going back to sleep. But, if a task was waiting for available TX queue space, this would cause it to discover its queues empty and make forward progress again.