pop-os / iso

Pop!_OS ISO production
Other
496 stars 65 forks source link

Feature request: add new `memtest86+` to recovery options via `systemd-boot` #291

Open wagnerck opened 2 years ago

wagnerck commented 2 years ago

The open-source memtest86+ (http://memtest.org/, distinct from the commercial memtest86 owned by PassMark at https://www.memtest86.com/) is under active development again. V6 is going to go into beta testing later this April, and it includes new features to make it useful once more, such as UEFI support, DDR5 testing, and more. But the most interesting feature that I'm seeing is that it compiles EFI executables by default, which work as expected when added to the /boot/efi partition.

This would allow us to add the new memtest86+ to our default Pop!OS installs, as another boot option that users can choose along with the recovery partition. I feel like this would be very useful for troubleshooting purposes, and could be a proof-of-concept for adding other hardware troubleshooting tools to the EFI partition. The memtest.efi file is 135k and should not cause any space issues on /boot/efi.

The following commands will build the new memtest86+ (v6 alpha) to include both an ISO you can burn to a USB stick and a memtest.efi file (although additional packages may be required, xorriso was the only additional one I had to install myself):

$ git clone https://github.com/memtest86plus/memtest86plus
$ cd memtest86plus/build64
$ sudo apt install xorriso
$ make iso
$ ls
app      floppy.img  lib          memtest.efi  memtest_shared      tests
boot     iso         Makefile     memtest.iso  memtest_shared.bin
esp.img  ldscripts   memtest.bin  memtest.mbr  system

Then copy memtest.efi to /boot/efi/EFI/memtest/memtest.efi, and create /boot/efi/loader/entries/memtest.conf that contains the following:

title memtest86+
efi /EFI/memtest/memtest.efi
options smp

This works in a UEFI VM and on my oryp6. See the following photos: IMG_0154 IMG_0155

You can also add it to the firmware's boot device listing via efibootmgr, but I feel like the more important thing is having it in the systemd-boot menu, since that's where our users go to choose the recovery partition, and the command to open up that menu is universal across hardware (i.e. hold down the spacebar).

And if there's also a way to add this to the installer ISO, as an option in the initial boot menu, that would make this useful for folks at the factory who are doing installation and testing, as well as for people trying to install Pop!OS and can't because they have a hardware problem. I don't know if that would be done with an EFI file or with something else, like the floppy image.

I've been told that it was determined that we can't do this kind of thing with the commercial memtest86 from PassMark, but this project appears to be fully GPL2 (https://github.com/memtest86plus/memtest86plus/blob/main/LICENSE) so there shouldn't be any insurmountable licensing issues.

Caveats: memtest86+ v6 is still in an alpha state, with a beta in late April 2022, so while testing might be worth doing right now, we probably shouldn't actually add this to the shipping product until later. Maybe the beta will be stable enough to start shipping it, maybe not.

Also, we would clearly need to test this on a wide variety of hardware, especially our desktop PC line (Thelio and Meerkat) because memtest86+ has a new keyboard detection system (as outlined on the GitHub page) which may not reliably detect external USB keyboards on all systems just yet, or may require booting with CSM and/or legacy modes enabled.

I'm not sure how difficult this would be to add to our installer, but I feel like the best option here would be to treat it like the recovery partition, since it also will only be present on UEFI systems. Pop!OS could also possibly update the EFI executable independent of the OS, using the same mechanism used to update the recovery partition.

You can supposedly change the order of the options in the systemd-boot menu with a sort-key option in the configuration file (see https://systemd.io/BOOT_LOADER_SPECIFICATION/), but I couldn't get it to work. I think it needs to have a sort-key in all of the configuration files under /boot/efi/loader/entries/ for this to work correctly, but I didn't mess with the ones automatically generated by Pop!OS to test this.

Finally, even if the EFI executable doesn't turn out to be feasible to include in Pop!OS, we should be happy that there's once again going to be a fully open-source memtest that we can refer users to, that won't nag them to install a "Pro" version.

mmstick commented 2 years ago

I think it'd just require making a debian package for it which installs to that path and adds an entry.conf for it. Shouldn't require much or any change to the installer.

wagnerck commented 2 years ago

Is there a way to make it so a Debian package won't install on non-UEFI systems? I know there's a lot of of people (myself included) who are still running Pop!OS on BIOS hardware. That was part of why I thought that doing it as part of the installer might be a better idea.

Also, that wouldn't allow us to add it as a boot option to the installer ISO, right?

mmstick commented 2 years ago

The installer would only make it available for new installs. A package could make it available to all systems in an update. The debian package could use a postinst script to copy the EFI binary to the EFI directory and set up the entry for it (if it even requires a boot conf), and a postrm script could remove it. The script could do nothing on a system without an EFI partition mounted.

wagnerck commented 2 years ago

If you think that's the better solution, I'll look into how I can create a Debian package for this instead of adding it to the Pop!OS installer setup.

thomas-zimmerman commented 2 years ago

I think it would be better to have the deb install the files in the ESP and create the laoder entry--and on remove do nothing unless purged. This would allow the package to be included on install (creating the entry) and not be installed in the resulting system.

Support would be able to request install of the deb package again if the memory test is not found in the systemd-boot menu.

With this design, would the Pop!OS ISO have the entry as well?

mmstick commented 2 years ago

I don't think that it would, but something like that could be added separately.

wagnerck commented 2 years ago

I feel like having the package remain as part of the resulting install would be important because then it could be updated with a new EFI binary as part of your normal apt update+upgrade process, or through the Pop!Shop. Support would still be able to do apt reinstall --purge pop-memtest or whatever if necessary, but it would be kept up-to-date continually by default. The same package could also theoretically be expanded later to include other tools, so maybe naming it something like pop-diagnostics to start would be better? These are decisions that can made later since they won't affect the basic plan.

The Pop!OS ISO would need to have it added to the opening menu, I think? On UEFI systems it shows the "Try or Install Pop_OS" screen, which only has the one menu option at the moment. Adding a second one ("Test System Memory" perhaps) would be a matter of adding the menu option to /boot/grub/grub.cfg and have it point to the executable, I think. And yeah, this would have to be separate from a Debian package.

wagnerck commented 2 years ago

I'm working on a proof-of-concept for this (with a package name named pop-diagnostics) and while it turned out to be fairly trivial to have the package tools copy the appropriate files to /boot/efi, I've run into a snag. Attempting to upgrade the package fails, as dpkg uses symlinks as part of the upgrade process, so it can rollback to the original files easily if the upgrade fails, and that doesn't work on an EFI partition since it's FAT32.

ckw@flatline:~/Documents/projects/memtest$ sudo dpkg -i pop-diagnostics_0.1.1_amd64.deb 
[sudo] password for ckw: 
(Reading database ... 258533 files and directories currently installed.)
Preparing to unpack pop-diagnostics_0.1.1_amd64.deb ...
systemd-boot present, continuing install
Unpacking pop-diagnostics (0.1.1) over (0.1.0) ...
dpkg: error processing archive pop-diagnostics_0.1.1_amd64.deb (--install):
 unable to make backup link of './boot/efi/EFI/memtest/memtest.efi' before installing new version: Operation not permitted
Errors were encountered while processing:
 pop-diagnostics_0.1.1_amd64.deb

This is not a unique problem. as there's a Raspbian hack to get around the same problem and a Ubuntu issue about this is almost ten years old, etc. If anyone here has any ideas about the proper way to work around this problem? I'm looking into options but they're all a bit hacky. Or would this be something better handled by the same tools that maintain the recovery partition?

mmstick commented 2 years ago

Perhaps the package shouldn't be placing files in the EFI partition directly, but through a postinst / postrm script.

wagnerck commented 2 years ago

That was in fact my thought. Right now they're being placed using a .install file:

memtest86plus/build64/memtest.efi /boot/efi/EFI/memtest/
src/memtest.conf /boot/efi/loader/entries/

If they're instead placed via a postinst script, and removed on uninstall/purge with a postrm script, dpkg won't try to do the symlink thing, right? My concern is that this does feel a bit hacky, but if this is the proper method, I'll look into doing it that way instead.

mmstick commented 2 years ago

Yes there wouldn't be a symlink issue to worry about. All the packages that add files to the EFI partition (kernelstub, grub) make the EFI installation a separate part of the package installation. kernelstub gets run with initramfs updates which copies kernels over and generates configs. There could be an initramfs hook that installs the files if they're missing, if not done directly by the postinst.

wagnerck commented 2 years ago

The package would install the files to somewhere in /usr/lib or the like as their official package locations, and then the scripts would copy them to the ESP? I think having them be part of the package scripts might be better than using initramfs hooks in this case, since errors in the hook could cause the process of creating the boot image to fail entirely. It would also allow users to install the package from the recovery partition; even though the recovery partition resets itself on restart, the files copied to the ESP would still be present.

This seems like a good way to move forward, and I'll try to find the least hacky way to get the postinst and postrm scripts to do this.

wagnerck commented 2 years ago

I've got a prototype in place that installs, uninstalls, upgrades, etc: https://github.com/cwsystem76/pop-diagnostics

Rather than moving this repo to the pop-os github org, I think using my personal repo to experiment and then doing a new, clean repo with those experiments as a foundation would be better.

wagnerck commented 2 years ago

Summarizing some Slack discussion here:

thomas-zimmerman commented 2 years ago

Name of the update script would be memtest-update.sh

wagnerck commented 2 years ago

I was thinking memtest-utils.sh because it can also be used to remove it, or possibly even add launch parameters if we add that functionality.

wagnerck commented 2 years ago

The latest commit now has memtest-utils.sh, which is called by the package postinst and prerm scripts. An example command that a user can run in a terminal:

sudo /usr/lib/pop-diagnostics/memtest/memtest-util.sh reinstall

A future version of this script will add an "options" verb to let systemd-boot pass launch parameters to the EFI executable, like keyboard=legacy or nosm when necessary for troublesome hardware.

wagnerck commented 2 years ago

The "options" feature has been added to memtest-utils.sh and the readme and comments generally improved. For example:

sudo /usr/lib/pop-diagnostics/memtest/memtest-util.sh options keyboard=legacy

will write a custom configuration file that will not use the new USB keyboard drivers, and will instead rely on the UEFI firmware to initialize the keyboard in legacy mode.

wagnerck commented 2 years ago

The current Beta 2 build doesn't work with the latest Coreboot firmware on my oryp6, but the dev build from nightly source does. One of a few reasons to hold off on any public release until it's out of beta and we've done a lot more testing.

[edit] This problem has been resolved.