silitics / rugpi

An open-source platform empowering you to build innovative devices around customized Linux distributions.
https://rugpi.io
Apache License 2.0
35 stars 1 forks source link

Boot into cold spare on next system boot #24

Closed raryanpur closed 4 months ago

raryanpur commented 5 months ago

Is it possible to use rugpi-ctrl to indicate that the system should try booting into the cold set on next boot without calling rugpi-ctrl system reboot --spare? The use case is for scenarios in which a new image has been downloaded and installed to the cold spare, but the reboot should not occur immediately (e.g. because the user is in the middle of something). Ideally, the user could power the system off whenever they please, and the next time they power it on it'll boot into the cold spare.

koehlma commented 5 months ago

Whether or not this is possible depends on the details of the boot flow. The recommended tryboot boot flow for Raspberry Pi is realized by setting certain flags with Raspberry Pi's firmware^1 using an argument to the reboot system call. I do not think that it will be possible to use this mechanism/boot flow across power cycling. There is only little documentation of what actually happens under the hood, but from what I know, the flags are stored in EEPROM (non-volatile storage) only on some models while they are presumably stored in volatile storage on others where they are erased on power cycling. As this is not officially documented/supported, I think, it would be a bad idea to rely on any specific behavior. For the u-boot boot flow, however, it would be possible (and also rather easy to implement).

What boot flow are you using/targeting?

raryanpur commented 5 months ago

Thank you for the detailed and prompt reply!

I'm currently using the tryboot boot flow on an RPi4, but for no special reason other than it being default/recommended. If there are no/few drawbacks to using u-boot instead of tryboot, I'd actually prefer that because of u-boot's maturity and what sounds like possibly a more consistent experience across different boards?

koehlma commented 5 months ago

Well, tryboot has certain advantages. For instance, it will likely be supported on future Raspberry Pi models immediately, whereas it may take some time for U-Boot to catch up. This is due to it being built directly into Raspberry Pi's firmware. Furthermore, it supports updates of the config.txt file and firmware files.

Actually, after thinking about it a bit more, I have a different idea: We could set a flag for Rugpi itself. The system would then first boot into the default partition but Rugpi would immediately initiate a reboot very early before even starting Systemd. Technically, there is a reboot then, but from a user's perspective, this will be very quick and without the system coming fully up. What do you think about this idea?

raryanpur commented 5 months ago

That sounds like a great solution! I think a key requirement, to your point, is that the user doesn't perceive a difference in boot time. If the switch to the cold partition occurs early in the boot process (prior to systemd coming up as you mention), then that should cover it.

If the reboot into the cold partition fails, then rugpi would behave as usual, correct? In other words, if the cold partition doesn't get committed as "hot," then a reboot should fallback to the "working" partition rather than continuing to try to boot into the new one? Maybe in this case there's a slight difference in that the reboot into the cold partition should only occur a certain number of times if it fails to be committed? I think u-boot handles this with a "try times" feature to prevent an infinite boot loop to a bad new update.

koehlma commented 5 months ago

I think a key requirement, to your point, is that the user doesn't perceive a difference in boot time.

Indeed. The difference should be minor. Technically, this is just part of a deferred update and updates may take some additional time anyway, for instance, in case migrations need to run. So, I guess, it would be fine.

If the reboot into the cold partition fails, then rugpi would behave as usual, correct?

Correct, that would be a one-shot flag stored on the data partition that makes Rugpi reboot into the spare partition set once. The flag would be cleared prior to rebooting such that the system falls back to the default partition set in case the system does not commit. We could also have a counter, if this is desired.

raryanpur commented 5 months ago

Correct, that would be a one-shot flag stored on the data partition that makes Rugpi reboot into the spare partition set once. The flag would be cleared prior to rebooting such that the system falls back to the default partition set in case the system does not commit. We could also have a counter, if this is desired.

I think the option of retrying more than once would be helpful when booting from network, and any other scenarios in which determining the set is "good" relies on non-deterministic tasks. That being said, once is fine for my use case.

koehlma commented 5 months ago

I think the option of retrying more than once would be helpful when booting from network, and any other scenarios in which determining the set is "good" relies on non-deterministic tasks. That being said, once is fine for my use case.

For those cases, one probably wants something more elaborate than a simple counter (e.g., something like exponential backoff). I am currently a bit busy and will take care of the flag in the next weeks.

koehlma commented 4 months ago

Installing an update with --reboot deferred will now defer rebooting to the spare partition to the next boot. As discussed, this is done by setting a flag which is checked early during boot and, if the flag is present, triggering an immediate reboot to the spare partition.