Closed rmustacc closed 2 months ago
I'd even suggest making steps 2+3 an atomic API; the ability to leave a slot activated without resetting seems like a footgun. (This would also play nicely with #1707)
They were left separately on purpose and this was intentional. In particular @cbiffle brought up the desire to not have them joined together as well.
Yeah, I feel like leaving them separate is safer in general. We've got a few cases where we've treated inter-system interfaces like this "cleverly" and bonded verbs and states together, and it winds up biting us. If there isn't a correctness issue in leaving them separate, I'd make each operation separate and well-defined on its own.
Switching the active slot does raise the spectre of a "surprise reboot" into the new image if the SP were to, like, power cycle before you sent the reboot -- I think this should be clearly documented on the operation, but we probably shouldn't attempt to prevent it.
https://github.com/oxidecomputer/hubris/pull/1808 should take care of this. This should now work for setting it via the gateway API but actually using it in, say, wicket also requires gateway changes.
Fixed by #1808 on the hubris side
Today the H753 update server basically requires that the act of writing and activating a slot be joined together. Fundamentally we want to separate out the following operations into three things that can be run separately:
The most important thing is making sure that 1/2 are no longer joined as it means that if you want to rollback because you booted an old image that had a broken say Sidecar front I/O transceiver, then we could roll that back without doing a manual update. This likely needs changes to the h753 update server and then the control plane agent.