oxidecomputer / hubris

A lightweight, memory-protected, message-passing kernel for deeply embedded systems.
Mozilla Public License 2.0
2.95k stars 167 forks source link

Want to separate H753 image download from slot activate #1755

Closed rmustacc closed 2 months ago

rmustacc commented 5 months ago

Today the H753 update server basically requires that the act of writing and activating a slot be joined together. Fundamentally we want to separate out the following operations into three things that can be run separately:

  1. Write an image to a slot.
  2. Activate a slot.
  3. Reset the SP.

The most important thing is making sure that 1/2 are no longer joined as it means that if you want to rollback because you booted an old image that had a broken say Sidecar front I/O transceiver, then we could roll that back without doing a manual update. This likely needs changes to the h753 update server and then the control plane agent.

mkeeter commented 5 months ago

I'd even suggest making steps 2+3 an atomic API; the ability to leave a slot activated without resetting seems like a footgun. (This would also play nicely with #1707)

rmustacc commented 5 months ago

They were left separately on purpose and this was intentional. In particular @cbiffle brought up the desire to not have them joined together as well.

cbiffle commented 4 months ago

Yeah, I feel like leaving them separate is safer in general. We've got a few cases where we've treated inter-system interfaces like this "cleverly" and bonded verbs and states together, and it winds up biting us. If there isn't a correctness issue in leaving them separate, I'd make each operation separate and well-defined on its own.

Switching the active slot does raise the spectre of a "surprise reboot" into the new image if the SP were to, like, power cycle before you sent the reboot -- I think this should be clearly documented on the operation, but we probably shouldn't attempt to prevent it.

labbott commented 3 months ago

https://github.com/oxidecomputer/hubris/pull/1808 should take care of this. This should now work for setting it via the gateway API but actually using it in, say, wicket also requires gateway changes.

labbott commented 2 months ago

Fixed by #1808 on the hubris side