rauc / rauc

Safe and secure software updates for embedded Linux
https://rauc.io
GNU Lesser General Public License v2.1
820 stars 200 forks source link

Allow more fine-grained control about clearing eMMC boot slots #780

Open Ablu opened 3 years ago

Ablu commented 3 years ago

It looks like clear_slot always clears the entire "device" before writing the new slot. This can be problematic for devices which "abuse" the boot0 partition by storing additional data there.

For example, Toradex stores their hardware configuration block at the end of boot0 and also, by default, places the u-boot environment there. One can argue that this is not really a nice location for such data, but we found ourselves in the need to keep that data there for backwards-compatibility reasons.

For now I simply patched out the clearing of the slot (even if some old data is left over it does not really matter in our case since hopefully nothing jumps or reads from there :P), but we would be interested into seeing this in mainline.

Would you be interested into accepting a patch which makes the clearing configurable? Or do maybe need some more fine-grained control about the area which should be cleared?

I would try to find time to submit a patch if we can agree on a solution that would be mainline-able :)

jluebbe commented 3 years ago

It looks like clear_slot always clears the entire "device" before writing the new slot. This can be problematic for devices which "abuse" the boot0 partition by storing additional data there.

For example, Toradex stores their hardware configuration block at the end of boot0 and also, by default, places the u-boot environment there. One can argue that this is not really a nice location for such data, but we found ourselves in the need to keep that data there for backwards-compatibility reasons.

I'd argue the same way. eMMCs support creating additional "general purpose" partitions if you want to keep this kind of data separate from the main OS.

Using boot partitions for dynamic data raises further issues: Does this mean that you have two u-boot environments? How is the "active" copy selected? How would you handle migration and compatibility?

Our usual suggestion is to limit the amount of dynamic data accessed by the boot-loader to the strict minimum required (a defined variable set instead of a full env).

For now I simply patched out the clearing of the slot (even if some old data is left over it does not really matter in our case since hopefully nothing jumps or reads from there :P), but we would be interested into seeing this in mainline.

Are you sure your ROM code cannot be confused by whatever is left in the boot partitions from previous versions? The exact contents of the un-erased space depend on the full update history of each device, making the low level boot process of device in the field impossible to test (and ultimately possibly non-deterministic).

Would you be interested into accepting a patch which makes the clearing configurable? Or do maybe need some more fine-grained control about the area which should be cleared?

RAUC has some tradition of trying to make it harder to make design errors, so I'm skeptical. One approach might be to reuse the region-start/-size options from https://rauc.readthedocs.io/en/latest/advanced.html#update-boot-partition-in-gpt.

This would mean that RAUC always erases this region completely and also writes only to the region. It would need to check that the image is not larger than the region-size. Also, it should be documented that this method is only intended for backwards compatibility and not a recommended design.

I would try to find time to submit a patch if we can agree on a solution that would be mainline-able :)

The change would need to place emphasis on keeping the additional complexity for the normal case low.

Ablu commented 3 years ago

eMMCs support creating additional "general purpose" partitions if you want to keep this kind of data separate from the main OS.

Yep. Though Toradex of course ships their modules pre-partitioned. So the eMMC is pre-programmed :/. And even if they did not, one would have devices in the field which are programmed already. So if one did not think about this before the devices went into production this is unrealistic to change in many cases I fear.

Using boot partitions for dynamic data raises further issues: Does this mean that you have two u-boot environments? How is the "active" copy selected? How would you handle migration and compatibility?

Well the Toradex BSP does not really prepare for A/B updates of the bootloader, so both boot partitions would read the same environment from boot0 unless one changes that (which is not a big problem of course). The configuration data that toradex places there during production is the thing which is most annoying. Those are placed there when devices are manufactured, so one would either need to move that data later or hard-code the otherwise dynamically read data directly within u-boot. Both would be an annoying derivation from the default Toradex BSP.

Are you sure your ROM code cannot be confused by whatever is left in the boot partitions from previous versions? The exact contents of the un-erased space depend on the full update history of each device, making the low level boot process of device in the field impossible to test (and ultimately possibly non-deterministic).

This has not proven itself problematic in the past for us... That's how we have been doing the updates before. A way to clear a certain section of course would of course be great though! But for non-eMMC-boot-partitions the clearing does not seem to currently happen too, right?

RAUC has some tradition of trying to make it harder to make design errors, so I'm skeptical. One approach might be to reuse the region-start/-size options from https://rauc.readthedocs.io/en/latest/advanced.html#update-boot-partition-in-gpt.

This would mean that RAUC always erases this region completely and also writes only to the region. It would need to check that the image is not larger than the region-size. Also, it should be documented that this method is only intended for backwards compatibility and not a recommended design.

Sounds good to me! Let me know if this approach would be fine to implement in a patch.

The change would need to place emphasis on keeping the additional complexity for the normal case low.

If the region-start/-size thing is too complex code wise, I personally would be fine with a simple flag (which is documented with a big warning describing that no clearing happens) to disable the clearing too... If in doubt I can still over-provision the .img file that I am flashing in order to make sure enough stuff is overwritten.

UVV-gh commented 2 years ago

@Ablu I solved this issue by moving Toradex factory config block during my module bring-up procedure out of eMMC boot partition and placing it before the first GPT partition and after reserved GPT header space.

jluebbe commented 2 years ago

RAUC has some tradition of trying to make it harder to make design errors, so I'm skeptical. One approach might be to reuse the region-start/-size options from https://rauc.readthedocs.io/en/latest/advanced.html#update-boot-partition-in-gpt. This would mean that RAUC always erases this region completely and also writes only to the region. It would need to check that the image is not larger than the region-size. Also, it should be documented that this method is only intended for backwards compatibility and not a recommended design.

Sounds good to me! Let me know if this approach would be fine to implement in a patch.

The change would need to place emphasis on keeping the additional complexity for the normal case low.

If the region-start/-size thing is too complex code wise, I personally would be fine with a simple flag (which is documented with a big warning describing that no clearing happens) to disable the clearing too... If in doubt I can still over-provision the .img file that I am flashing in order to make sure enough stuff is overwritten.

We thought about this again. region-size has a different meaning, so we wouldn't want to reuse it. Perhaps size-limit?

To be predictable and consistent to the existing behavior (and what would happen without this option), RAUC should still clear the partition up to the limit. The result is that everything below the limit is completely predictable. RAUC should also check that the limit does not exceed the actual device size.

loblik commented 9 months ago

Are you sure your ROM code cannot be confused by whatever is left in the boot partitions from previous versions? The exact contents of the un-erased space depend on the full update history of each device, making the low level boot process of device in the field impossible to test (and ultimately possibly non-deterministic).

We've run into this also. To be honest I don't understand how ROM code could be "confused by some leftovers". Code cannot be confused. If some ROM code expects zeros (cleared blocks) at some offset, then these should be part of the update image. But I don't think RAUC should make any assumptions here.

Ablu commented 9 months ago

Did not realize that I was assigned here. I switched jobs and are no longer involved in projects that suffer from this problem. On modern systems one would probably use UEFI capsule updates.