nihalpasham / rustBoot

rustBoot is a standalone bootloader written entirely in `Rust`, designed to run on anything from a microcontroller to a system on chip. It can be used to boot into bare-metal firmware or Linux.
MIT License
213 stars 21 forks source link

Damaging an update firmware makes rustboot hang at boot #80

Open lionelains opened 1 month ago

lionelains commented 1 month ago

When manually changing the firmware payload of a signed firmware, rustboot seems to upgrade flash to it but subsequent boots never process correctly.

Found in https://github.com/nihalpasham/rustBoot/commit/8437fd2a6ebf79d68a885da895e009fafccfccee

Scenario

Build the sample boot demo binaries from rustboot project:

Before programming these 3 firmwares, we modify at least one byte in the 3rd binary (the upgrade firmware), and program the 3 firmwares on the board.

Expected

green blinking firmware boots and refuses to program the red firmware at all subsequent boots, only the green firmware boots

Observed

green blinking firmware boots the board reboots rustboot does not launch neither the red nor the green firmware and remains stuck in the boot phase resetting the board does not revert to the green firmware either, possibly resulting in a bricked device

Steps to reproduce on NUCLEO-H723ZG

Run the normal demo:

cargo stm32h723 build-sign-flash rustBoot 1234 1235

Watch the green firmware blink, then red firmware

Alter the red firmware (overwrite 4 bytes at offset 1024) :

cp ./boards/sign_images/signed_images/stm32h723_updtfw_v1235_signed.bin /tmp/stm32h723_updtfw_v1235_signed.altered.bin
printf '\xff\x00\xff\x00' | dd of=/tmp/stm32h723_updtfw_v1235_signed.altered.bin bs=1 seek=1024 count=4 conv=notrunc

Program again the two application firmwares:

probe-rs-cli erase --chip STM32H723ZGTx
probe-rs-cli download --format Bin --base-address 0x8020000 --chip STM32H723ZGTx ./boards/sign_images/signed_images/stm32h723_bootfw_v1234_signed.bin
probe-rs-cli download --format Bin --base-address 0x8060000 --chip STM32H723ZGTx /tmp/stm32h723_updtfw_v1235_signed.altered.bin

Program rustboot and run the demo.

probe-rs-cli run --chip STM32H723ZGTx ./boards/target/thumbv7em-none-eabihf/release/stm32h723

Observe the board first running the green firmware, then reboot and hang while in rustboot.

lionelains commented 1 month ago

When rust boot hangs, execution seems to hit the following panic(), which actually makes sense: https://github.com/nihalpasham/rustBoot/blob/8437fd2a6ebf79d68a885da895e009fafccfccee/rustBoot/src/image/image.rs#L572

nihalpasham commented 1 month ago

@imrank03 - would you be able to look into this?

imrank03 commented 1 month ago

@imrank03 - would you be able to look at this?

Hi @nihalpasham, since the issue seems to be functioning as expected, I didn't look into it further.

nihalpasham commented 1 month ago

@lionelains - I believe we are unable to recreate the above scenario. Can you help in recreating this bug?

nihalpasham commented 1 month ago

Closing. feel free re-open if needed

lionelains commented 1 month ago

I am going to give it a try again on my board. I'll keep you posted.

lionelains commented 1 month ago

I did reproduce on https://github.com/nihalpasham/rustBoot/commit/8437fd2a6ebf79d68a885da895e009fafccfccee and also on https://github.com/nihalpasham/rustBoot/commit/d4394d383ba3758574159c6630e4c3261a6b47f1

Same thing after a

rustup update

which leads to:

   stable-x86_64-unknown-linux-gnu updated - rustc 1.81.0 (eeb90cda1 2024-09-04) (from rustc 1.78.0 (9b00956e5 2024-04-29))
  nightly-x86_64-unknown-linux-gnu updated - rustc 1.83.0-nightly (9e394f551 2024-09-25) (from rustc 1.82.0-nightly (636d7ff91 2024-08-19))

I use the nightly toolchain to compile rustBoot (cargo +nightly stm32h723 build-sign-flash rustBoot 1234 1235).

In all these configurations, my board boots the green firmware, but never upgrades to the red (manually corrupted) firmware (as expected). However, the board never boots any firmware anymore, even after pressing the black reset button on the NUCLEO-H723ZG board.

nihalpasham commented 1 month ago

@imrank03

lionelains commented 1 month ago

@imrank03, on which board did you try to run the test? on a NUCLEO-H723ZG?

imrank03 commented 1 month ago

On which board did you try to run the test? on a NUCLEO-H723ZG?

imrank03 commented 2 weeks ago

@lionelains, I was able to reproduce the issue, and it seems the problem is related to the boot-loader panicking. Here's the output from my command:

❯ probe-rs run --chip STM32H723ZGTx ./boards/target/thumbv7em-none-eabihf/release/stm32h723 
    Erasing ✔ [00:00:01] [##############################################] 128.00 KiB/128.00 KiB @ 67.63 KiB/s (eta 0s)
    Programming ✔ [00:00:01] [###############################################] 52.00 KiB/52.00 KiB @ 51.06 KiB/s (eta 0s)
    Finished in 2.922s
Frame 0: __bkpt @ 0x0800c3b2 inline
        ./asm/lib.rs:48:1
Frame 1: __bkpt @ 0x000000000800c3b2
        ./asm/lib.rs:51:17
Frame 2: bkpt @ 0x0800c38e inline
        /Users/IKL2KOR/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cortex-m-0.7.7/src/call_asm.rs:19:21
Frame 3: panic @ 0x000000000800c38a
        /Users/IKL2KOR/Imran_Bosch/rust_dev/rustBoot/boards/bootloaders/stm32h723/src/main.rs:21:9
Frame 4: panic_bounds_check @ 0x08000a82
        /rustc/798fb83f7d24e31b16acca113496f39ff168c143/library/core/src/panicking.rs:271:1

The CPU halts due to a panic in the bootloader code at main.rs:21:9. I’ll work on resolving the issue. In the meantime, if you manage to solve it, feel free to submit a PR.

lionelains commented 2 weeks ago

Hi @imrank03 , yes, the panic implementation is indeed in main.rs. As far as I could diagnose, the root case actually comes from here: https://github.com/nihalpasham/rustBoot/blob/8437fd2a6ebf79d68a885da895e009fafccfccee/rustBoot/src/image/image.rs#L572 Changing it from a panic() into a returned Result<> would allow the caller to handle the case (in this specific scenario, that would probably mean ignore and continue booting from the other slot).

I will see if I can propose a refactor of the code following the strategy described above, but I am not going to be able to work on this during at least the coming two weeks.

imrank03 commented 2 weeks ago

Hi @lionelains , thanks for the update and for pointing out the root cause in image.rs. Your suggestion to refactor the panic!() into a returned Result<> makes sense. Handling the integrity check failure in a more graceful way, like continuing booting from the other slot, seems like a good approach.

I understand you won’t be available for the next two weeks, so I’ll look into it and see if I can start implementing the changes in the meantime. Once you’re back, feel free to jump in as well.

Let me know if there's anything else I should consider while working on this!