pop-os / distinst

Installer Backend
GNU Lesser General Public License v3.0
221 stars 43 forks source link

Fix detection of missing encrypted partitions #324

Closed mmstick closed 1 year ago

mmstick commented 1 year ago

This is to debug an issue currently seen with our latest ISOs. This will add some extra logs to display the partition that failed to be applied for the refresh config, a dump of lsblk before applying the config, and a delay of 3 seconds before getting a list of volumes after a decrypted PV is online.

mmstick commented 1 year ago

So it's not an encrypted partition that it can't find, but the EFI partition.

mmstick commented 1 year ago

Did you check if #323 caused the regression?

mmstick commented 1 year ago
identifiers: PartitionIdentifiers { id: Some("nvme-Samsung_SSD_970_EVO_Plus_500GB_S58SNJ0N126927J-part1"), label: None, part_label: None, part_uuid: None, path: Some("pci-0000:04:00.0-nvme-1-part1"), uuid: Some("84C1-0A8A") } }

The PartUUID is None for some reason for this partition.

leviport commented 1 year ago

Did you check if #323 caused the regression?

I will check. It seems somewhat likely at this point. The refresh I did when testing that PR was from the recovery partition, which doesn't seem to have the same problem (or at least, it's uncommon enough for me to not have encountered it yet).

mmstick commented 1 year ago

I don't see anything there would cause this though. Our partition-identity crate looks up the PartUUIDs of every partition when it's scanning for available installation options. There's no involvement with any C libraries since we're checking /dev/disk/by-partuuid paths directly. Last change to this code was early in 2022.

I'll add a check for a missing PartUUID property that will re-check if the PartUUID exists in the system.

leviport commented 1 year ago

I've had a couple successful refreshes with the latest changes to this branch. I want to do some more to make sure, but I'm feeling good about this.

leviport commented 1 year ago

I've done about 10 more refreshes with this fix, and I've seen the same failure 3 times. I'm going to do a full clean install again, then more refreshes, just in case something was wonky with my EFI partition.

leviport commented 1 year ago

I saw one refresh failure out of ~15 refreshes with the latest commit. Here is the log, in case the additional output is helpful: installer-iso-646-96e0b0a-missing.txt

However I think this could have been a fluke, since I rebooted directly back into the live environment and refreshed a second time, without rebooting back into the main OS first. I'm going to do more refreshes tomorrow, but I think this has at least reduced the likelihood that the refresh failure bug occurs. If that still seems to be the case tomorrow, I will approve this and we can build new ISOs.

mmstick commented 1 year ago

This log shows it is failing to find the PartUUID of the recovery partition. I notice that the root partition, which is encrypted, is also missing a PartUUID. I'll check for a better way of resolving missing PartUUIDs, as it seems the by-path properties are missing on the partitions lacking their PartUUIDs.

mmstick commented 1 year ago

This change should get PartUUID lookups working for everything now. Including the recovery partition, since it was missing its by-path property. Though it's strange that the partition identifiers are suddenly missing in the refresh environment. It seems like the devices are getting probed before their /dev/disk/ paths are created.