zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.52k stars 6.45k forks source link

MCUboot not confirm image when using 'west flash' #34683

Closed nandojve closed 3 years ago

nandojve commented 3 years ago

Describe the bug When an App uses MCUboot as bootloader it needs confirm any "clean/new" image at first boot. This always worked on Zephyr until pre-2.5.0 version. Now, it was detected that an image flashed using OpenOCD is not confirming images anymore. This was detected with frdm-k64f, nrf52840dk_nrf52840 and nucleo-f767zi. One possible consequence is an infinite loop that tries boot the board.

To Reproduce Steps to reproduce the behavior:

  1. git checkout master

  2. git fetch upstream

  3. git rebase upstream/master

  4. west update

  5. cd ..

  6. ls bootloader modules net-tools tools zephyr

  7. west build -b frdm_k64f -d build/mcuboot-frdm_k64f bootloader/mcuboot/boot/zephyr

  8. west flash -d build/mcuboot-frdm_k64f

  9. west build -b frdm_k64f -d build/app zephyr/samples/subsys/mgmt/updatehub

  10. west sign -t imgtool -d build/app --no-hex -- --version 1.0.0 --pad --key bootloader/mcuboot/root-rsa-2048.pem

  11. west flash -d build/app --bin-file build/app/zephyr/zephyr.signed.bin

  12. see error

Expected behavior The recent flashed Image confirm at first boot.

Impact At least stop development with newer Zephyr versions.

Logs and console output

*** Booting Zephyr OS build zephyr-v2.5.0-2840-g363af20fc4a6  ***
I: Starting bootloader
I: Primary image: magic=good, swap_type=0x1, copy_done=0x3, image_ok=0x3
I: Scratch: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
I: Boot source: primary slot
I: Swap type: none
I: Bootloader chainload address offset: 0x20000
I: Jumping to the first image slot
*** Booting Zephyr OS build zephyr-v2.5.0-2840-g363af20fc4a6  ***

[00:00:03.001,000] <inf> eth_mcux: ETH_0 enabled 100M full-duplex mode.
[00:00:03.002,000] <inf> main: UpdateHub sample app started
[00:00:03.002,000] <inf> main: Confirming the boot image
[00:00:03.002,000] <err> main: Error to confirm the image
[00:00:03.002,000] <inf> main: Network disconnected
uart:~$ 
[00:00:11.006,000] <inf> main: Network connected
[00:00:11.006,000] <inf> main: Starting UpdateHub polling mode
[00:00:11.006,000] <inf> updatehub: SHA-256 verification on download and from flash
[00:00:11.006,000] <err> updatehub: The current image is not confirmed
[00:00:11.006,000] <err> updatehub: Image is unconfirmed. Rebooting to revert back to previousconfirmed image.*** Booting Zephyr OS build zephyr-v2.5.0-2840-g363af20fc4a6  ***
I: Starting bootloader
I: Primary image: magic=good, swap_type=0x1, copy_done=0x3, image_ok=0x3
I: Scratch: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
I: Boot source: primary slot
I: Swap type: none
I: Bootloader chainload address offset: 0x20000
I: Jumping to the first image slot
*** Booting Zephyr OS build zephyr-v2.5.0-2840-g363af20fc4a6  ***

[00:00:03.001,000] <inf> eth_mcux: ETH_0 enabled 100M full-duplex mode.
[00:00:03.002,000] <inf> main: UpdateHub sample app started
[00:00:03.002,000] <inf> main: Confirming the boot image
[00:00:03.002,000] <err> main: Error to confirm the image
[00:00:03.002,000] <inf> main: Network disconnected
uart:~$ 

Environment (please complete the following information):

Additional context All variants that UpdateHub uses are working at b0bd14b5fa. This includes proper boot, network and upgrades. I hope we can find a solution or a possible missing config to re-enable the feature.

nandojve commented 3 years ago

To make it easy to reproduce:

git checkout master
git fetch upstream
git rebase upstream/master
west update

Apply this diff

diff --git a/samples/hello_world/prj.conf b/samples/hello_world/prj.conf
index b2a4ba5910..2db4bd6f2c 100644
--- a/samples/hello_world/prj.conf
+++ b/samples/hello_world/prj.conf
@@ -1 +1,7 @@
-# nothing here
+CONFIG_LOG=y
+CONFIG_BOOTLOADER_MCUBOOT=y
+CONFIG_MCUBOOT_IMG_MANAGER=y
+CONFIG_FLASH=y
+CONFIG_REBOOT=y
+CONFIG_IMG_MANAGER=y
+CONFIG_MPU_ALLOW_FLASH_WRITE=y
diff --git a/samples/hello_world/src/main.c b/samples/hello_world/src/main.c
index 6c5c8a27dc..a00464c3ba 100644
--- a/samples/hello_world/src/main.c
+++ b/samples/hello_world/src/main.c
@@ -5,9 +5,21 @@
  */

 #include <zephyr.h>
-#include <sys/printk.h>
+#include <dfu/mcuboot.h>
+
+#define LOG_LEVEL CONFIG_LOG_LEVEL_INFO
+#include <logging/log.h>
+LOG_MODULE_REGISTER(hello_world);

 void main(void)
 {
-       printk("Hello World! %s\n", CONFIG_BOARD);
+       LOG_INF("Hello MCUboot World!");
+
+       if (boot_write_img_confirmed() < 0) {
+               LOG_ERR("Error to confirm the image");
+       }
+
+       if (boot_is_img_confirmed()) {
+               LOG_INF("Everything is working.");
+       }
 }

Move out Zephyr dir:

cd ..

For frdm-k64f

west build -b frdm_k64f -d build/mcuboot-frdm_k64f bootloader/mcuboot/boot/zephyr
west flash -d build/mcuboot-frdm_k64f

west build -b frdm_k64f -d build/app zephyr/samples/hello_world
west sign -t imgtool -d build/app --no-hex -- --version 1.0.0 --pad --key bootloader/mcuboot/root-rsa-2048.pem
west flash -d build/app --bin-file build/app/zephyr/zephyr.signed.bin
*** Booting Zephyr OS build zephyr-v2.5.0-3527-g89ec3d82adf1  ***
I: Starting bootloader
I: Primary image: magic=good, swap_type=0x1, copy_done=0x3, image_ok=0x3
I: Scratch: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
I: Boot source: primary slot
I: Swap type: none
I: Bootloader chainload address offset: 0x20000
I: Jumping to the first image slot0*** Booting Zephyr OS build zephyr-v2.5.0-3527-g89ec3d82adf1  ***
[00:00:00.005,000] <inf> hello_world: Hello MCUboot World!
[00:00:00.005,000] <err> hello_world: Error to confirm the image
nandojve commented 3 years ago

For nrf52840dk_nrf52840

west build -b nrf52840dk_nrf52840 -d build/nrf52840dk_nrf52840 bootloader/mcuboot/boot/zephyr
west flash -d build/nrf52840dk_nrf52840

west build -b nrf52840dk_nrf52840 -d build/app zephyr/samples/hello_world
west sign -t imgtool -d build/app -- --version 1.0.0 --pad --key bootloader/mcuboot/root-rsa-2048.pem
west flash -d build/app --hex-file build/app/zephyr/zephyr.signed.hex
*** Booting Zephyr OS build zephyr-v2.5.0-3321-g126da28620d7  ***
I: Starting bootloader
I: Primary image: magic=good, swap_type=0x1, copy_done=0x3, image_ok=0x3
I: Secondary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
I: Boot source: primary slot
I: Swap type: none
I: Bootloader chainload address offset: 0xc000
I: Jumping to the first image slot
*** Booting Zephyr OS build zephyr-v2.5.0-3321-g126da28620d7  ***
[00:00:00.005,737] <inf> hello_world: Hello MCUboot World!
[00:00:00.005,798] <err> hello_world: Error to confirm the image
nandojve commented 3 years ago

For nucleo_f767zi

west build -b nucleo_f767zi -d build/nucleo_f767zi bootloader/mcuboot/boot/zephyr
west flash -d build/nucleo_f767zi

west build -b nucleo_f767zi -d build/app zephyr/samples/hello_world
west sign -t imgtool -d build/app -- --version 1.0.0 --pad --key bootloader/mcuboot/root-rsa-2048.pem
west flash -d build/app --hex-file build/app/zephyr/zephyr.signed.hex
*** Booting Zephyr OS build zephyr-v2.5.0-3321-g126da28620d7  ***
I: Starting bootloader
I: Primary image: magic=good, swap_type=0x1, copy_done=0x3, image_ok=0x3
I: Scratch: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3
I: Boot source: primary slot
I: Swap type: none
I: Bootloader chainload address offset: 0x40000
I: Jumping to the first image slot
*** Booting Zephyr OS build zephyr-v2.5.0-3321-g126da28620d7  ***
[00:00:00.005,000] <inf> hello_world: Hello MCUboot World!
[00:00:00.005,000] <err> hello_world: Error to confirm the image
nandojve commented 3 years ago

CC: @nvlsianpu , @de-nordic , @utzig

MaureenHelm commented 3 years ago

The title of this issue suggests the problem is related to OpenOCD, but I don't think that's the case since you've reproduced it on frdm_k64 (pyocd), nrf52840dk_nrf52840 (jlink), and nucleo_f767zi (openocd). Reassigning to the DFU subsystem maintainer.

utzig commented 3 years ago

I just run some tests with MCUboot and smp_svr on latest master and everything is working correctly. hello_world does not write the image flags, so I assume you have modified it to use updatehub, which calls boot_set_confirmed from MCUboot, set a breakpoint there and check what condition fails. Also, there's no need to confirm an image unless you're doing a test upgrade, and there's no need to pass --pad to imgtool if you're flashing the image in the primary slot.

nandojve commented 3 years ago

Hi @utzig ,

The smp_svr is not part of this use case. The procedure described here should work, at least it worked till Dec/20 for all these boards.

This issue happen when an external agent install a clean image at some slot (in this case at first slot, it could be second). And yes, I'm interested to use boot_set_confirmed because UpdateHub uses test upgrade.

Here, I created a test application and a step by step procedure. If this procedure fail to confirm this MCUboot Hello World test image, it is impossible to use any firmware upgrade strategy with Zephyr that instructs MCUboot to perform a test upgrade.

If there were changes and I need update UpdateHub, it is OK. I just need know what are the proper steps.

utzig commented 3 years ago

I guess I missed much of the previous description, but I confirmed the issue and run it in the debugger, and the issue happens here:

https://github.com/mcu-tools/mcuboot/blob/master/boot/bootutil/src/bootutil_public.c#L586

This line checks that the image copy_done flag is set, which it is not because no upgrade was performed, and thus it fails to write the confirmed flag. Someone can track down where this changed recently /cc @nvlsianpu

nvlsianpu commented 3 years ago

@nandojve Thanks for report. I must have overlooked the issue.

nvlsianpu commented 3 years ago

@utzig

https://github.com/mcu-tools/mcuboot/blob/master/boot/bootutil/src/bootutil_public.c#L586

This line checks that the image copy_done flag is set, which it is not because no upgrade was performed, and thus it fails to write the confirmed flag. Someone can track down where this changed recently /cc @nvlsianpu

So yes, current implementation of in MCUboot bootutill public is more strict than the one before. I would just remove the check of copy-done flag. Actually if image is running after DFU, it must have been verified. Only exclusion is image flashed directly by programing interface while mcuboot has disabled primary slot check.