This resolves a four-year-old TODO in El Jefe asking for a way to process faulted tasks without making so many kipcs. The original supervisor kipc interface was, by definition, designed before we knew what we were doing. Now that we have some miles on the system, some things are more clear:
The supervisor doesn't use the TaskState data to make its decisions.
The TaskState data is pretty expensive to serialize/deserialize, and produces code containing panic sites.
Panic sites in the supervisor are bad, since it is not allowed to panic.
The new find_faulted_task operation can detect all N faulted tasks using N+1 kipcs, instead of one per potentially faulted task, and the request and response messages are trivial to serialize (one four-byte integer each way). This has allowed me to write (out-of-tree) "minisuper," a supervisor in 256 bytes that cannot panic.
In-tree, this has the advantage of knocking 33% off Jefe's flash size and reducing statically-analyzable max stack depth by 20%.
This resolves a four-year-old TODO in El Jefe asking for a way to process faulted tasks without making so many kipcs. The original supervisor kipc interface was, by definition, designed before we knew what we were doing. Now that we have some miles on the system, some things are more clear:
The new find_faulted_task operation can detect all N faulted tasks using N+1 kipcs, instead of one per potentially faulted task, and the request and response messages are trivial to serialize (one four-byte integer each way). This has allowed me to write (out-of-tree) "minisuper," a supervisor in 256 bytes that cannot panic.
In-tree, this has the advantage of knocking 33% off Jefe's flash size and reducing statically-analyzable max stack depth by 20%.