silitics / rugpi

An open-source platform empowering you to build innovative devices around customized Linux distributions.
https://rugpi.io
Apache License 2.0
35 stars 1 forks source link

OTA update fails when moving between Rugpi 0.6 and 0.7 images #29

Closed reubenmiller closed 2 months ago

reubenmiller commented 2 months ago

Description

OTA update fail (e.g. switch back to the original partition) when upgrading across Rugpi versions 0.6 to 0.7.

The image is successfully downloaded, and the device tries to boot into the new partition, however the device automatically rolls back after ~20 seconds.

However, when performing an OTA update from an image that was built using the same Rugpi version, the update works (e.g. the device boots up in the new partition successfully), both for Rugpi 0.6 to 0.6, and Rupig 0.7 to 0.7 based images.

Procedure

The following procedure can be used to reproduce the problem:

  1. Flash the following image to the SD card

    https://github.com/thin-edge/tedge-rugpi-image/releases/download/20240603.0849/tedge_rugpi_tryboot_20240603.0849.img.xz

  2. Boot Raspberry Pi 5 device (with the new SD card)

    # rugpi-ctrl system info
    Boot Flow: tryboot
    Hot: a
    Cold: b
    Default: a
    Spare: b
  3. Download the second image (built using 0.7)

    https://github.com/thin-edge/tedge-rugpi-image/releases/download/20240617.2150/tedge_rugpi_rpi-tryboot_20240617.2150.img.xz

  4. Open Rugpi admin webpage, e.g. http://rpi5-abcdef.local:8088, and update the device using the second image

    https://github.com/thin-edge/tedge-rugpi-image/releases/download/20240617.2150/tedge_rugpi_rpi-tryboot_20240617.2150.img.xz

  5. Wait for the update

    Observations

    • The device will try to switch partition, the fan stays on for an extended period of time, until eventually the device reboots again (unaided), and it switches back to the original partition
    • The device can't be connected to via ssh (indicating that it is probably a boot loader problem, and the OS isn't starting)

    Afterwards, the rugpi-ctrl system info output confirm that the partition rollback occurred:

    # rugpi-ctrl system info
    Boot Flow: tryboot
    Hot: a
    Cold: b
    Default: a
    Spare: b

Notes

System information

Property Value
Device Raspberry Pi 5B
Boot flow tryboot
OS Raspberry Pi OS
Rugpi version Update from 0.6 to 0.7
koehlma commented 2 months ago

Thanks for reporting this issue. 👍 We will be investigating it.

reubenmiller commented 2 months ago

Below are some more test results for different combinations from the initial flashing of an SD card to upgrading to a given image using OTA.

Test Case No. Rugpi Version Transition across OTA Results
1 0.6 :arrow_right: 0.7 :x: Fail
2 0.6 :arrow_right: 0.6 :white_check_mark: Pass
3 0.7 :arrow_right: 0.7 :white_check_mark: Pass
4 0.7 :arrow_right: 0.6 :white_check_mark: Pass
koehlma commented 2 months ago

We were able to identify and resolve the issue. We will soon publish a new release.

What happened?

The algorithm to repartition the disk did yield results that were incompatible with previous layouts produced by fdisk. In particular, the last usable sector for MBR partitions was set to account for GPT padding. As a result, the disk appeared smaller than it was and the algorithm computed a smaller size for the data partition than the already existing data partition had. This was then caught by runtime checks ensuring that the partition table cannot be messed up (as this would be disastrous and likely cause data to be lost). Instead, of messing up the partition table, the update then failed. So, while there was a bug, the actual update system and checks worked as intended.