mcu-tools / mcuboot

Secure boot for 32-bit Microcontrollers!
Apache License 2.0
1.31k stars 667 forks source link

nRF52 DK Power Loss Reset Issue #1643

Closed ShahinHaque97-CT closed 1 year ago

ShahinHaque97-CT commented 1 year ago

Hello,

I have implemented Serial DFU following this guide - https://devzone.nordicsemi.com/guides/nrf-connect-sdk-guides/b/software/posts/ncs-dfu#serial_dfu

Edited the overlays/config to use external flash and everything is working fine.

I have been doing some tests for reliability, if there is power loss during the download of the update, everything is fine as the application in slot 1 is still running and I can always redownload the image into slot 2.

However, if there is power loss when I am swapping the images, I have an issue.

00> Booting Zephyr OS build v3.2.99-ncs1 00> I: Starting bootloader 00> I: Primary image: magic=good, swap_type=0x4, copy_done=0x1, image_ok=0x1 00> I: Scratch: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3 00> I: Boot source: none 00> I: Swap type: none 00> I: Bootloader chainload address offset: 0xc000 00> I: Jumping to the first image slot

The above log is repeated everything I press the RESET Button on the development board and the application does not run.

If I remove the power via USB, and plug it back in the application runs absolutely fine, and thus I know the image is not corrupted.

What's the difference between power on reset and nRESET on the chip? I am assuming a flag is set for nRESET which is preventing the application from being sorted.

nordicjm commented 1 year ago

Has the board booted into serial recovery mode? That would prevent the application from booting

ShahinHaque97-CT commented 1 year ago

@nordicjm How would I be able to know? The above log is what is displayed upon MCU reset button.

nordicjm commented 1 year ago

If the above is what you see, then your application has been booted, if you attached a debugger to it you should see your program counter would be in your zephyr application's address space. To check if it's in serial recovery mode, use the image list command e.g. mcumgr --conntype serial --connstring dev=/dev/ttyACM0 image list

ShahinHaque97-CT commented 1 year ago

@nordicjm

C:\ncs\v2.2.0\nrf\samples\bluetooth\nrf52832-external-flash-dfu_scratch>mcumgr --conntype serial --connstring "COM23,baud=115200" image erase Done C:\ncs\v2.2.0\nrf\samples\bluetooth\nrf52832-external-flash-dfu_scratch>mcumgr --conntype serial --connstring "COM23,baud=115200" image upload -e build/zephyr/app_update.bin 53.61 KiB / 53.61 KiB [=================================================] 100.00% 2.50 KiB/s 21s

Done C:\ncs\v2.2.0\nrf\samples\bluetooth\nrf52832-external-flash-dfu_scratch>mcumgr --conntype serial --connstring "COM23,baud=115200" image list Images: image=0 slot=0     version: 0.0.0     bootable: true     flags: active confirmed     hash: f2ec040399383f8501b35bd555ff551e366b78335322772fdc0c4c4478a7bb4d image=0 slot=1     version: 0.0.0     bootable: true     flags:     hash: 5437f873820ac11adcb8ca5a85ce86f4707f056d83a446b7f0e1d61869f68ba4 Split status: N/A (0) 

C:\ncs\v2.2.0\nrf\samples\bluetooth\nrf52832-external-flash-dfu_scratch>mcumgr --conntype serial --connstring "COM23,baud=115200" image test 5437f873820ac11adcb8ca5a85ce86f4707f056d83a446b7f0e1d61869f68ba4 Images: image=0 slot=0     version: 0.0.0     bootable: true     flags: active confirmed     hash: f2ec040399383f8501b35bd555ff551e366b78335322772fdc0c4c4478a7bb4d image=0 slot=1     version: 0.0.0     bootable: true     flags: pending     hash: 5437f873820ac11adcb8ca5a85ce86f4707f056d83a446b7f0e1d61869f68ba4 Split status: N/A (0) 

C:\ncs\v2.2.0\nrf\samples\bluetooth\nrf52832-external-flash-dfu_scratch>mcumgr --conntype serial --connstring "COM23,baud=115200" reset Done #############Power has been removed here while the swap is happening.###########

C:\ncs\v2.2.0\nrf\samples\bluetooth\nrf52832-external-flash-dfu_scratch>mcumgr --conntype serial --connstring "COM23,baud=115200" image list Images: image=0 slot=0     version: 0.0.0     bootable: true     flags: active     hash: 5437f873820ac11adcb8ca5a85ce86f4707f056d83a446b7f0e1d61869f68ba4 image=0 slot=1     version: 0.0.0     bootable: true     flags: confirmed     hash: f2ec040399383f8501b35bd555ff551e366b78335322772fdc0c4c4478a7bb4d Split status: N/A (0) C:\ncs\v2.2.0\nrf\samples\bluetooth\nrf52832-external-flash-dfu_scratch>

From the logs it shows from the image test that the images have swapped over. judging from the hash.

ShahinHaque97-CT commented 1 year ago

00> Booting Zephyr OS build v3.2.99-ncs1 00> I: Starting bootloader 00> I: Primary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3 00> I: Scratch: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3 00> I: Boot source: primary slot 00> I: Swap type: test 00> I: Starting swap using scratch algorithm. 00> I: Bootloader chainload address offset: 0xc000 00> I: Jumping to the first image slot

Reset Button Pressed

00> Booting Zephyr OS build v3.2.99-ncs1 00> I: Starting bootloader 00> I: Primary image: magic=good, swap_type=0x2, copy_done=0x1, image_ok=0x3 00> I: Scratch: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3 00> I: Boot source: none 00> I: Swap type: revert 00> I: Starting swap using scratch algorithm.

Reset Button Pressed

00> Booting Zephyr OS build v3.2.99-ncs1 00> I: Starting bootloader 00> I: Primary image: magic=good, swap_type=0x4, copy_done=0x3, image_ok=0x3 00> I: Scratch: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3 00> I: Boot source: primary slot 00> I: Starting swap using scratch algorithm. 00> I: Bootloader chainload address offset: 0xc000 00> I: Jumping to the first image slot 00> Booting Zephyr OS build v3.2.99-ncs1 00> I: Starting bootloader

Reset Button Pressed

00> I: Primary image: magic=good, swap_type=0x4, copy_done=0x1, image_ok=0x1 00> I: Scratch: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3 00> I: Boot source: none 00> I: Swap type: none 00> I: Bootloader chainload address offset: 0xc000 00> I: Jumping to the first image slot

After USB power is removed

00> Booting Zephyr OS build v3.2.99-ncs1 00> TEST 10/03/2023 nrf52dk_nrf52832 00> build time: Mar 10 2023 14:23:55

(The current program just displays that string.)

nordicjm commented 1 year ago

00> I: Starting swap using scratch algorithm.

Rebuild mcuboot using swap using move, not swap using scratch, then retry

ShahinHaque97-CT commented 1 year ago

@nordicjm

The mucmgr logs are the same,

00> Booting Zephyr OS build v3.2.99-ncs1 *** 00> I: Starting bootloader 00> I: Primary image: magic=good, swap_type=0x2, copy_done=0x1, image_ok=0x3 00> I: Secondary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3 00> I: Boot source: none 00> I: Swap type: revert 00> I: Starting swap using move algorithm. 00> I: Secondary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3 00> I: Bootloader chainload address offset: 0xc000 00> I: Jumping to the first image slot

Reset Button Pressed

00> Booting Zephyr OS build v3.2.99-ncs1 00> I: Starting bootloader 00> I: Primary image: magic=good, swap_type=0x4, copy_done=0x1, image_ok=0x1 00> I: Secondary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3 00> I: Boot source: none 00> I: Swap type: none 00> I: Bootloader chainload address offset: 0xc000 00> I: Jumping to the first image slot

Reset Button Pressed

00> Booting Zephyr OS build v3.2.99-ncs1 00> I: Starting bootloader 00> I: Primary image: magic=good, swap_type=0x4, copy_done=0x1, image_ok=0x1 00> I: Secondary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3 00> I: Boot source: none 00> I: Swap type: none 00> I: Bootloader chainload address offset: 0xc000 00> I: Jumping to the first image slot

Reset Button Pressed

00> Booting Zephyr OS build v3.2.99-ncs1 00> I: Starting bootloader 00> I: Primary image: magic=good, swap_type=0x4, copy_done=0x1, image_ok=0x1 00> I: Secondary image: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3 00> I: Boot source: none 00> I: Swap type: none 00> I: Bootloader chainload address offset: 0xc000 00> I: Jumping to the first image slot

After USB power is removed

00> Booting Zephyr OS build v3.2.99-ncs1 00> After flash! TEST 10/03.2023 nrf52dk_nrf52832 00> build time: Mar 10 2023 14:46:56

//-------------------------

To change the config this was done CONFIG_BOOT_SWAP_USING_SCRATCH=n CONFIG_BOOT_SWAP_USING_MOVE=y

ShahinHaque97-CT commented 1 year ago

Adding CONFIG_GPIO_AS_PINRESET=n along with Move instead of Scratch, the power loss seems to be working (even without pressing the button), Have only tried with a small sample size so far as end of work day.

May I ask your reasoning to suggesting Move instead of Scratch? I have my scratch partitions allocated and they work (apart from the power loss).

I will do some more tests to confirm it is working

utzig commented 1 year ago

May I ask your reasoning to suggesting Move instead of Scratch? I have my scratch partitions allocated and they work (apart from the power loss).

Amusingly enough swap with move was broken until I fixed it recently with #1597. So if you're using NCS 2.2.0 you still have the one that is broken. It would be more helpful if you could dump the status area from the primary slot when your issue happens. Also how big are your images?

ShahinHaque97-CT commented 1 year ago

@utzig The image is currently very small as I am using the hello world example for testing. As this is early development, I will try my external flash dfu configs on the latest SDK and see if that has corrected my issue. Encase that doesn't work, how will I display the status dump area? is there a debug config for this?

ShahinHaque97-CT commented 1 year ago

@utzig Using the latest SDK, with a small test sample of 10 updates so far, I have been successful with internal scratch, external flash. However, with external scratch, the halt is still present with reset button. I am still prompted with the original logs

ShahinHaque97-CT commented 1 year ago

Update: Using External Flash & External Scratch, I have been able to get it working without bricking by changing the scratch partition to a smaller size of 4KB. The error is probably still there but too quick for me to catch it. the chance of the error is more frequent as the scratch size is increased

nordicjm commented 1 year ago

@utzig perhaps this is the same issue I used to see with swap using scratch. I'm pretty sure I was able to get the problem to occur when only using internal flash too but maybe I'm misremembering

utzig commented 1 year ago

@utzig perhaps this is the same issue I used to see with swap using scratch. I'm pretty sure I was able to get the problem to occur when only using internal flash too but maybe I'm misremembering

Yes, that's what I thought.

@ShahinHaque97-Corintech Nice investigation! This might be the same issue as described in #1550, and many times before by @nordicjm. Have you also tried changing the size of the scratch in the internal flash to see if it ever bricks? So, if you end up never seeing this bug with a scratch of 4KB in the QSPI flash then it could point to an issue in the QSPI flash driver for writes larger than a 4KB, which would not be a sector anymore (AFAIR).

I've never tried using scratch on the external flash myself, but I'll try to do it later today. It could be that just the amount of writes on the scratch partition is enough to trigger some bug in the QSPI interface or flash driver. But that's just a guess!

how will I display the status dump area? is there a debug config for this?

To dump the status area, you just need to read the last 4KB sector of the primary partition using whatever tool you are using to write the flash, nrfjprog I would assume. Then either zip and upload it here, or gist an hexdump of the dump. That should give us more information of where the interrupt happened. Since this might also be related to scratch writes failing in some way, it might be useful to also dump the scratch partition, although I am not entirely sure it would actually be helpful...

ShahinHaque97-CT commented 1 year ago

I am using an nRF52832 with external flash as a cost saving measure instead of the nRF52840. the nRF52832 is using SPI rather than QSPI. The flash currently used for testing is the Winbond W25Q16JV on the nRF52 DK development board, IC used for production is yet to be determined.

ShahinHaque97-CT commented 1 year ago

My pm_static file is listed below:

app: address: 0xc200 end_address: 0x62000 region: flash_primary size: 0x55e00

external_flash: address: 0x56000 end_address: 0x200000 region: external_flash size: 0x1aa000

mcuboot: address: 0x0 end_address: 0xc000 placement: before:

mcuboot_pad: address: 0xc000 end_address: 0xc200 placement: align: start: 0x1000 before:

mcuboot_primary: address: 0xc000 end_address: 0x62000 orig_span: &id001

mcuboot_primary_app: address: 0xc200 end_address: 0x62000 orig_span: &id002

mcuboot_scratch: address: 0x56000 end_address: 0x74000 placement: after:

mcuboot_secondary: address: 0x0 device: DT_CHOSEN(nordic_pm_ext_flash) end_address: 0x56000 placement: align: start: 0x4 region: external_flash share_size:

sram_primary: address: 0x20000000 end_address: 0x20010000 region: sram_pri

//-----------------------------------------------------

Am I correct in saying, I would need to do a nrfjprog --memrd X --n 4096 where X is the primary partition offset by 4Kb, which location would this be? Am I using mcuboot_primary_app? 0x62000 - 0x1000 = 0x61000 and so: nrfjprog --memrd 0x61000 --n 4096

ShahinHaque97-CT commented 1 year ago

My original issue was with power loss during the swap it is stuck in the state as mentioned in the first post. using nRESET reloads the log but removing USB POWER allows the code to finish the swap and continue running.

To provide the logs, I have used nrfjprog --readcode to show the entire hex dump. I have attached each stage of the progress. However one thing to note, after it is stuck in the loop, I have done a hex dump which disconnects RTT. Reconnecting RTT, while USB POWER is still connected, allows the application swap fine. I am I am assuming it doesn't set the same flags as nRESET.

The files contained are as follows:

1 - Full chip reset using nRF Connect Programmer 2 - Application built using west build -b nrf52dk_nrf52832 -p && west flash --erase 3 - Using image test, successful swap. 4 - Reverted back to the first image, Just my doing a reset 5 - Power loss mcumgr --conntype serial --connstring "COM23,baud=115200" image test HASH mcumgr --conntype serial --connstring "COM23,baud=115200" reset USB removed. USB reconnected. RTT loaded.

00> Booting Zephyr OS build v3.2.99-ncs2 00> I: Starting bootloader 00> I: Primary image: magic=good, swap_type=0x2, copy_done=0x1, image_ok=0x3 00> I: Scratch: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3 00> I: Boot source: none 00> I: Swap type: revert 00> I: Starting swap using scratch algorithm. 00> Booting Zephyr OS build v3.2.99-ncs2 00> I: Starting bootloader 00> I: Primary image: magic=good, swap_type=0x4, copy_done=0x3, image_ok=0x3 00> I: Scratch: magic=bad, swap_type=0x1, copy_done=0x2, image_ok=0x2 00> I: Boot source: primary slot 00> I: Starting swap using scratch algorithm. 00> I: Bootloader chainload address offset: 0xc000 00> I: Jumping to the first image slot

Entire process is repeated with 5v2

nRF52832_reset_logs.zip

ShahinHaque97-CT commented 1 year ago

Update instead of removing USB power, RTT disconnect and reconnect works fine. Is there a way of setting up a timeout to reset the RESET flags and so it'll boot up as if it is power on reset. This isn't really solving the issue but masking it allowing to run though.

ShahinHaque97-CT commented 1 year ago

A more pressing issue -

I decided to try my code with Bluetooth as that is the intended purpose. Using this - https://devzone.nordicsemi.com/guides/nrf-connect-sdk-guides/b/software/posts/ncs-dfu#ble_testing

Latest SDK.

External Flash / External Scratch. With power loss during swap, the device is completely bricked, Even with removing USB power, the device is stuck in this state as shown below:

00> Booting Zephyr OS build v3.2.99-ncs2 00> I: Starting bootloader 00> I: Primary image: magic=good, swap_type=0x2, copy_done=0x3, image_ok=0x3 00> I: Scratch: magic=good, swap_type=0x2, copy_done=0x3, image_ok=0x3 00> I: Boot source: scratch 00> I: Starting swap using scratch algorithm. 00> E: Image in the primary slot is not valid! 00> E: Unable to find bootable image

nReset button pressed

00> Booting Zephyr OS build v3.2.99-ncs2 00> I: Starting bootloader 00> I: Primary image: magic=good, swap_type=0x2, copy_done=0x1, image_ok=0x3 00> I: Scratch: magic=bad, swap_type=0x1, copy_done=0x2, image_ok=0x2 00> I: Boot source: none 00> I: Swap type: test 00> E: Image in the secondary slot is not valid! 00> E: Image in the primary slot is not valid! 00> E: Unable to find bootable image

USB power killed

00> Booting Zephyr OS build v3.2.99-ncs2 00> I: Starting bootloader 00> I: Primary image: magic=good, swap_type=0x2, copy_done=0x1, image_ok=0x3 00> I: Scratch: magic=bad, swap_type=0x1, copy_done=0x2, image_ok=0x2 00> I: Boot source: none 00> I: Swap type: revert 00> E: Image in the primary slot is not valid! 00> E: Unable to find bootable image

As this is not recoverable this is a major issue, the device will have to be reflashed.

ShahinHaque97-CT commented 1 year ago

With internal scratch, external flash it works

I thought it worked with restarting USB due thinking it was crashing from the logs below:

//------------------------------------------------------------------------------------------------------------- 00> Booting Zephyr OS build v3.2.99-ncs2 00> I: Starting bootloader 00> I: Primary image: magic=good, swap_type=0x4, copy_done=0x1, image_ok=0x1 00> I: Scratch: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3 00> I: Boot source: none 00> I: Swap type: none 00> I: Bootloader chainload address offset: 0xc000 00> I: Jumping to the first i

Booting Zephyr OS build v3.2.99-ncs2 00> I: Starting bootloader 00> I: Primary image: magic=good, swap_type=0x4, copy_done=0x1, image_ok=0x1 00> I: Scratch: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3 00> I: Boot source: none 00> I: Swap type: none 00> I: Bootloader chainload address offset: 0xc000 00> I:

00> Booting Zephyr OS build v3.2.99-ncs2 00> I: Starting bootloader 00> I: Primary image: magic=good, swap_type=0x4, copy_done=0x1, image_ok=0x1 00> I: Scratch: magic=unset, swap_type=0x1, copy_done=0x3, image_ok=0x3 00> I: Boot source: none 00> I: Swap type: none 00> I: Bo

//-------------------------------------------------------------------------------------------------------------

However, with the Bluetooth application, I noticed that it is still running! (I couldn't tell with the serial method as it was a helloworld example with just a string message) I then switched RTT to UART and it works. This means when the Application swaps between MCUBOOT/SLOT1/SLOT2 RTT is disconnected not crashed.

So we have Serial/BT working with external flash & internal scratch!

github-actions[bot] commented 1 year ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.