mcu-tools / mcuboot

Secure boot for 32-bit Microcontrollers!
Apache License 2.0
1.3k stars 661 forks source link

Multiple SoCs have flash configurations unsupported by MCUboot #713

Open d3zd3z opened 4 years ago

d3zd3z commented 4 years ago

Some newer devices have flash configurations that are not supported currently by MCUboot. This issue attempts to collect these in one place to help with the design of any solutions to handle this situation.

Known devices:

jameswalmsley commented 4 years ago

STM32H7 (43) 32 bytes write, 128K erase

ghost commented 3 years ago

A further complication with STM32H7 is that the internal flash has an integrated ECC. Writing to the same flash word twice has a high probability of causing an ECC error, and if it's a double ECC error then a read results in a bus error. Does MCUboot rely on being able to write to the same flash word more than once?

utzig commented 3 years ago

Does MCUboot rely on being able to write to the same flash word more than once?

No, it only writes once to any "word΅ which is the alignment size for the flash supplied by the OS.

ghost commented 3 years ago

I have MCUboot working on a PoC level on STM32H7. This required changing both BOOT_MAX_ALIGN and BOOT_MAGIC_SZ to 32. The boot_img_magic array had to be changed as well (one copy in bootutil_misc.c, one in image.py, one in Zephyr's mcuboot.c, maybe more) because it's only 16 bytes which is less than the alignment. All in all it's simple changes, but just changing BOOT_MAX_ALIGN for everyone isn't backwards compatible.

Is there any good reason to not make BOOT_MAX_ALIGN configurable? A user could set it to e.g. 32, which would then cause the magic to be padded to 32 bytes. (Or 16, or 512?). Everyone else would keep it at 8 and keep compatibility.

jameswalmsley commented 3 years ago

I'd like to confirm also that I have been running MCUBoot on STM32H7 for 12 months now.

We made the same changes as you described above.

We did find its possible to get an ECC error if we lost power during an image swap, and so MCUBoot would cause a hard-fault during swap-resume.

We solved this by:

  1. Using a watchdog if an ECC fault is triggered.
  2. If boot-reason is due to a watchdog in the bootloader then we erase the image, and scratch areas.
  3. We provide a recovery mode, where we wait for a repair image over DFU.
  4. Just before we boot the application we set a flag to say the firmware was booted. If the firmware triggers WD then we don't cause recovery.
  5. If firmware crashes multiple times without a power-cycle, recovery is triggered. (We also count the number of watchdog resets).

Due to the possibility of getting an ECC error during resume, we have disabled resuming of partial image-swaps..

Interruption of image swaps will cause a recovery.

Probably sounds a bit complex, but this has worked really well for our application. We went for catching ECC faults with a watchdog and recovery mode to ensure that all eventualities are covered, and no matter what happens we can recover the device.

Best

J

nvlsianpu commented 3 years ago

I have MCUboot working on a PoC level on STM32H7. This required changing both BOOT_MAX_ALIGN and BOOT_MAGIC_SZ to 32

@d3zd3z @utzig What is your opinion about making this configurable?

utzig commented 3 years ago

@d3zd3z @utzig What is your opinion about making this configurable?

If someone is gonna tackle it, the person has to fix bootutil, the simulator, imgtool, mcumgr and newt, maybe the integrations in the supported Oses, and maybe other stuff which I fail to remember. Probably a bit more work that it might seem at first, but I don't think there are any big technical impediments.

d3zd3z commented 3 years ago

The other thing that is going to come up is that adding simulator support for this type of configure is going to point out the "rare" or "occasional" failures, and we'll need to actually figure out a way to fix them. Having some percentage of upgrade devices need recovery really isn't something I'd consider acceptable for a regular option.

We do have a completely different swap strategy that is under development that is intended for devices where the writes are larger (and typically use ECC), however this more requires the erase size to be fairly small. Having 8k erases would waste quite a bit of flash for these sectors.

However, the existing swap code should be assuming that each write block can only be written once, so this is probably a corner case bug, perhaps because of the larger write size.

ghost commented 3 years ago

@jameswalmsley Your description of how you handled ECC errors on STM32H7 by involving the watchdog did not fill me with joy. So I came up with a different way to solve the problem, by trapping ECC errors and returning them as -EIO in the flash API. Normally a bus fault can't be trapped, but there is a way around that.

I've opened zephyrproject-rtos/zephyr#33140 with a description of the problem, my proposed solution, and I link to some code that shows that it can work. The code fiddles with some architectural registers, so it would be good to get some feedback from someone who knows more about how those registers interact with the rest of the system.

jameswalmsley commented 3 years ago

@weinholtendian Nice, I have to check out your solution. Yes the watchdog was really a "catch-all" solution from having to implement something quickly, and the system we work on can easily be recovered by an external device, should it all go wrong.

There are some other systems that we have that won't tolerate that though, PR on this is great timing :) I was trying to find some way of stopping the hard-fault like you have done in your current implementation, but unfortunately didn't have time to attempt it.

I will pull in your PR and check it on our systems and try to review it soon.

We've also created a new swap method for the h7 that makes use of the stm32h7 bank-swapping. It works really well, and is much faster due to less need to erase and write sectors.

SwissKnife64 commented 3 years ago

I'd like to confirm also that I have been running MCUBoot on STM32H7 for 12 months now. We made the same changes as you described above. We did find its possible to get an ECC error if we lost power during an image swap, and so MCUBoot would cause a hard-fault during swap-resume. We solved this by: 1. Using a watchdog if an ECC fault is triggered. 2. If boot-reason is due to a watchdog in the bootloader then we erase the image, and scratch areas. 3. We provide a recovery mode, where we wait for a repair image over DFU. 4. Just before we boot the application we set a flag to say the firmware was booted. If the firmware triggers WD then we don't cause recovery. 5. If firmware crashes multiple times without a power-cycle, recovery is triggered. (We also count the number of watchdog resets). Due to the possibility of getting an ECC error during resume, we have disabled resuming of partial image-swaps.. Interruption of image swaps will cause a recovery. Probably sounds a bit complex, but this has worked really well for our application. We went for catching ECC faults with a watchdog and recovery mode to ensure that all eventualities are covered, and no matter what happens we can recover the device. Best J

Hello J We are implementing an application with Zephyr on an STM32H743 and are struggling with integrating the MCUBoot. It is based on the 32 byte minimal FLASH write size of th H7. It looks like you have solved the alignment problem. Can you please share your code with solution with us. Happy coding Chris

github-actions[bot] commented 3 years ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

d3zd3z commented 3 years ago

Re-opening to track.

elcritch commented 3 years ago

We do have a completely different swap strategy that is under development that is intended for devices where the writes are larger (and typically use ECC), however this more requires the erase size to be fairly small. Having 8k erases would waste quite a bit of flash for these sectors.

Is there any code that I could help test or implement for ECC flash? I have both LPC55S69 and CC3220 and would be interested to see if it'd be possible to get them both working with OTA, even if it's on a fork (for now).

However, the existing swap code should be assuming that each write block can only be written once, so this is probably a corner case bug, perhaps because of the larger write size.

Alternatively, it sounds like there is a bug and would make it easier to use ECC flashes with the current scheme by setting a larger block sizes? Any pointers on where to dive in would be great.

elcritch commented 3 years ago

This seems related to 841: Boot: Introduce new swap method using status partition, especially for chips with ECC based flash?

github-actions[bot] commented 2 years ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

dleach02 commented 2 years ago

@d3zd3z can we consider reopening this ticket? We have a hyperflash platform that has this problem of not being able to be supported by MCUBoot due to the size of the write (it has ECC)

irose-PeLtd commented 2 years ago

I'd like to see a fix for this, and I'd be willing to contribute to that. We use the LPC55xx processors and being able to use the MCUboot with them would be nice, especially since we are already using it with Nordic NRF52 and i.MX RT series of micros. It seems as though the issue is isolated to memories that implement ECC, is that correct?

d3zd3z commented 1 year ago

I'll go ahead an re-open this, as I am actually working on what I hope is a solution to this. I want to basically add support for flash devices with large write sizes. This will likely also require relatively small erase sizes.

vipulkute-eaton commented 1 year ago

Hi @d3zd3z

I was facing problem with firmware upgrade on STM32H743 controller because of 32byte alignment issue. Is this problem is fixed any of the new release version. I am looking for standard solution which can be compatible with other stm32 controller as well. Can you please provide the update on this issue. Thanks.

RomainPelletant commented 1 year ago

Any news regarding LPC55xx series support? It would be awesome. If not still supported, a PR/draft trying to implement it exists?

maximevince commented 1 year ago

Same question over here. Is there an ongoing effort? Anyone from NXP that can assist? Maybe @DerekSnell ? (Referring to https://github.com/zephyrproject-rtos/zephyr/discussions/49246)

GeorgeCGV commented 1 year ago

Is there any good reason to not make BOOT_MAX_ALIGN configurable?

1609 but that only allows setting a custom value.

Didn't do anything regarding:

fix bootutil, the simulator, imgtool, mcumgr and newt, maybe the integrations in the supported Oses, and maybe other stuff which I fail to remember.

github-actions[bot] commented 1 year ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

dleach02 commented 1 year ago

Need to unstale this

github-actions[bot] commented 6 months ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

maximevince commented 5 months ago

It seems MCUboot is now supported on the LPC5500 devices, although only in UPGRADE_ONLY mode.

Linking @DerekSnell from NXP's reply here, for those interesting in LPC55xx support: https://github.com/zephyrproject-rtos/zephyr/discussions/49246#discussioncomment-7821560

utzig commented 5 months ago

One fix for this would be to change the way the upgrade process is "logged". Instead of writing to flash each step of the process, as it's done now, which is limited by the "write size", one could build a table of the pre-calculated CRC-32 of every sector which will be swapped, and save it all in a single write. If the swap is interrupted the CRC-32 data can be used to find where it stopped. At least I think it makes sense in theory! Not a walk in the park, but probably not too hard and time consuming to create a PoC.

macharlachanakya commented 1 month ago

is triggered. (We also count the number of watchdog resets). Due to the possibility of getting an ECC error during resume, we have disabled resuming of partial image-swaps..

We are also using STM32H7, and also facing same issue, not able to decipher your comments, regarding to BOOT_MAX_ALIGN and BOOT_MAGIC_SZ , how can we adapt changes?