pengutronix / genimage

tool to generate multiple filesystem and flash images from a tree
GNU General Public License v2.0
318 stars 112 forks source link

GPT table in Genimage-created images may be mangled by Windows OS #262

Closed sairon closed 2 hours ago

sairon commented 4 months ago

Windows (at least starting with version 10) contain logic that alters the GPT table of a drive connected to it in some cases. More specifically it's been found it relocates the backup GPT table to the end of the drive and the partition array to LBA address equal to first_usable_lba - 32. There is no (easy) way to prevent this behavior in Genimage-created images because first_usable_lba is always set to the start of the first partition. Although this is rather a Windows bug/misfeature, images created using other utilities (like sgdisk) are not affected by this, as they generally set first_usable_lba to LBA 34, so it should be possible to achieve the same with Genimage.

For details see:

These issues show a real world regression triggered by the fact that Genimage works slightly different compared to other utilities. In theory I can imagine this relocation might also overwrite data located between the GPT header and the first actual partition if this area of the disk contained for example some bootloader data - although this is purely hypothetical.

I'd like to discuss the potential ways to mitigate this issue. Should we add a flag for setting first_usable_lba to the lowest possible value, or an arbitrary one, or even set it always to 34 like sgdisk does? Currently it's correlated to the align value but setting it to a one-sector size and juggling with offset values afterwards feels rather clumsy to me.

michaelolbrich commented 2 months ago

So my understanding was that first_usable_lba is the first LBA that can be used for partitions, and anything before that is off limits. That's why it is implemented as it is right now: There are real use-cases where bootloader data is placed between the partition table and the first partition.

But my understanding is clearly mistaken, or at least windows has a different understanding of what should happen here. So we should probably always do:

header.first_usable_lba = hd->gpt_location + (GPT_SECTORS - 1)*512;

That's 2 + 32 = 34 for the default partition table location. If the partition table is moved to make room for a bootloader, then that should still be acceptable based on you're description above.

sairon commented 1 month ago

Hi Michael, first of all, sorry for the delayed reply.

So my understanding was that first_usable_lba is the first LBA that can be used for partitions, and anything before that is off limits. That's why it is implemented as it is right now: There are real use-cases where bootloader data is placed between the partition table and the first partition.

Well, I'd assume the same but obviously Microsoft engineers thought it's good idea to do it like this. I haven't found any reasoning why it's happening, and I think it's too esoteric to cause any other issues. Just find the cause of the problem was quite a detective work.

But my understanding is clearly mistaken, or at least windows has a different understanding of what should happen here. So we should probably always do: ...

It's essentially the same what I ended up with: https://github.com/home-assistant/operating-system/pull/3497 (just taking into the fact that header.first_usable_lba is actually the LBA number, not byte offset). So if you agree, I'll put it into a PR here.

FWIW, the root cause for the RPi bootloader issue has been found and fix is on the way, but I think it'd be still good to stay on the safe side and adjust the Genimage behavior, as I'm not aware of any side effects.

Villemoes commented 3 weeks ago

FWIW, I've also been bitten by the current default behaviour. Not because of any Windows interference, but simply because I expected to be able to define a new partition on an existing target that had originally been created using genimage, between the (disk-order-)first and the GPT array, where I knew there was a large gap. But sgdisk wouldn't let me because the first-usable-lba was set to that disk-order-first partition.

So yes, I think that we should by default set first-usable-lba so it follows the primart GPT array, not being the start of the first (in disk-order) partition. We could make it an image option to set first-usable-lba explicitly, with some sanity check that it is after the end-of-array and before the first in-partition-table=true partition.

I can't really think of a situation where one would have some area between the partition array and the first partition which is known to be off-limits, where one wouldn't just define that area as a partition. But there are certainly situations where the first defined partition ends up being quite far from the end of the partition array (e.g. because it is defined with align=1M or something like that), and where one later wants to use some of that left-over area for a new small partition.

Villemoes commented 3 weeks ago

Hm, so digging a little, I found that I touched this area years ago.

commit 9190a2baadbf22374bd30369bc968c5e98a5ec05
Author: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Date:   Sat Mar 27 22:26:55 2021 +0100

    image-hd.c: correct computation of first_usable_lba

[...]    

    I considered instead just unconditionally using the LBA immediately
    following the primary GPT array. However, if there is a bootloader
    spanning the MBR+GPT header+GPT array, the area on the other side of
    the GPT array should not be available for adding a new partition. And
    doing it this way is in any case more aligned with what was done
    previously.

But yes, I've changed my mind, and now do believe that we should default to that GPT_array+32 placement.

Essentially, there are two extreme possible values for first_usable_lba: (1) minimum, gpt-array + 32, proposed new default (2) maximum, start of first in-partition-table partition, current default

and then there's the case where some in-partition-table=false defined image straddles the gpt-array (e.g. test/hole.config), in which case we would probably want to bump the default to (3) the lba following the end of that "bootloader" partition.

So how about the following: We add a new hdimage option, first-usable-offset (I want it to be in bytes and not LBA, similar to gpt-location), defaulting to the sentinel value 0. If it is given, we check during hdimage_setup that it is compatible with the above min and max values. If it is not given, we set it at the end of hdimage_setup to either (1) or (3), depending on whether there was such a bootloader image.

sairon commented 3 weeks ago

I have opened a PR with the patch that we use for a while for Home Assistant OS. I'll leave it up to Michael if he wants to merge it as-is or extend it with some toggles to accommodate for the scenarios discussed above.

michaelolbrich commented 2 weeks ago

Take a look at #273. This sets first_usable_lba to the lba after either a bootloader or the partition table. So without the bootloader, the result should be the same as #270. If you need a bootloader and combine that with Windows, then you should probably put the bootloader in an explicit partition, otherwise Windows will consider that space as unused...

What do you think?