void-linux / void-mklive

The Void Linux live image maker
https://voidlinux.org
Other
324 stars 189 forks source link

Latest x86_64 image is broken too #294

Closed splate07 closed 1 year ago

splate07 commented 1 year ago

I tried running the latest x86_64 live image (void-live-x86_64-20221001-xfce.iso) on real hardware via grub2 boot manager and it is broken. The root device cannot be found for some reason, and so the user is dropped into a debug shell. You can't reproduce the same issue with the previous version of the image (void-live-x86_64-20210930-xfce.iso).

Vaelatern commented 1 year ago

Is this image burned to the usb drive? I'm trying to make sense of the report.

splate07 commented 1 year ago

no, it is not this iso file is simply placed in the root directory of the 5th partition of my hard drive.

I have this entry in my grub.cfg file

menuentry "void amd64 2022 xfce" { set isofile="/void-live-x86_64-20221001-xfce.iso" loopback loop (hd1,msdos5)$isofile linux (loop)/boot/vmlinuz iso-scan/filename=$isofile root=live:CDLABEL=VOID_LIVE ro init=/sbin/init rd.luks=0 rd.md=0 rd.dm=0 rd.live.overlay.overlayfs=1 initrd (loop)/boot/initrd }

that's it. It works for 2021 image, it doesn't work for 2022 image

classabbyamp commented 1 year ago

I boot the isos fine via grub (with https://github.com/classabbyamp/glim). does the latest iso boot fine directly? this may be an issue with your computer

splate07 commented 1 year ago

I boot the isos fine via grub (with https://github.com/classabbyamp/glim). does the latest iso boot fine directly? this may be an issue with your computer

good for you now try to boot void-live-x86_64-20221001-xfce.iso using https://github.com/classabbyamp/glim and see for yourself btw, my menuetry is based on https://github.com/classabbyamp/glim/blob/master/grub2/inc-void.cfg

classabbyamp commented 1 year ago

ok I am able to reproduce now, but it's very odd:

repo-san commented 1 year ago

I'm also having troubles with this. booting the latest ISO through a grub2 menu entry doesn't work on my machine. I also tried thias' GLIM with no success. Both on bare metal and with qemu. I even tried qemu with an EFI file, no dice. dding the image to the same usb boots fine on bare metal, qemu and qemu with an EFI file (it goes to grub instead of syslinux). my uneducated guess is that it could be this dracut commit: https://github.com/void-linux/void-packages/commit/eef5529636d2672b514cba53e604fb6f5db9f99e https://github.com/dracutdevs/dracut/commit/87c4c17850e8bb982f6c07a6d3f58124bb2875de and a relevant issue in void-packages: https://github.com/void-linux/void-packages/issues/38367

20210930 iso (boots using a grub2 menu entry): dracut 53_2 kmod 27_3

20221001 iso (doesn't boot using a grub2 menu entry): dracut 53_4 kmod 30_1

0x5c commented 1 year ago

It's not dracut.

With rd.debug in the kcl, I took the rdsosreport.txt boot log generated by dracut for both the last working image and the first broken image (also adding rd.break to get a shell for the former).

In a diff between the two logs, 9 lines stand out in informational output before dracut starts searching for/mounting the root: (in cat /proc/self/mountinfo output)

-29 26 7:0 / /run/initramfs/live ro,relatime - iso9660 /dev/loop0 ro,nojoliet,check=s,map=n,blocksize=2048,iocharset=utf8
-31 1 254:0 / /sysroot rw,relatime - ext3 /dev/mapper/live-rw rw

(in cat /proc/mounts output)

-/dev/loop0 /run/initramfs/live iso9660 ro,relatime,nojoliet,check=s,map=n,blocksize=2048,iocharset=utf8 0 0
-/dev/mapper/live-rw /sysroot ext3 rw,relatime 0 0

(in blkid output)

-/dev/loop0: BLOCK_SIZE="2048" UUID="2021-10-07-00-22-44-00" LABEL="VOID_LIVE" TYPE="iso9660" PTUUID="4e4d61a4" PTTYPE="dos"
-/dev/loop1: TYPE="squashfs"
-/dev/mapper/live-base: UUID="65732de4-1bfe-479b-8269-be87b1fb8c8e" SEC_TYPE="ext2" BLOCK_SIZE="4096" TYPE="ext3"
-/dev/loop2: UUID="65732de4-1bfe-479b-8269-be87b1fb8c8e" SEC_TYPE="ext2" BLOCK_SIZE="4096" TYPE="ext3"
-/dev/mapper/live-rw: UUID="65732de4-1bfe-479b-8269-be87b1fb8c8e" BLOCK_SIZE="4096" TYPE="ext3"

The loop devices are not even present when booting a newer image, which points to the kernel, which is also the only relevant package that had updates between the last working and first broken images (5.13.19_1 and 5.19.10_1 respectively).

This is confirmed when booting a freshly built image made with mklive's -v linux5.13 argument.

Now that I know it's the kernel, I'll try to narrow down what version between 5.13 and 5.19 broke this.

0x5c commented 1 year ago

It's linux 5.19. The last version that boots properly from loopback is 5.18.

There doesn't seem to be relevant changes in the dotconfigs of those two versions, nor in the patches.

LaszloGombos commented 1 year ago

Some wild guesses:

0x5c commented 1 year ago

Could it be a missing kernel module for the storage - e.g. mmc - https://www.reddit.com/r/voidlinux/comments/y03b8b/baytrail_stopped_booting_after_updating_to_519/

In my case at least, tests were done on a standard desktop computer, and the storage holding both the GLIM setup and the ISO image is a plain FAT32 partition (part type ID 0c, fs created with mkfs.vfat) on a normal USB FLASH drive (/dev/sdX) with MBR partition table.

It seems to me like all modules possibly involved in that are already loaded, and furthermore, at the time dracut drops me to a shell, the partition on the USB drive is indeed already mounted at /run/initramfs/isoscan and its contents are present in that directory as expected.

I'll also be setting up a testbench in QEMU to do further tests.

Is the loop module loaded/available ? Perhaps a missing "modprobe loop" somewhere ?

At the dracut debug shell, loop in indeed present in /proc/modules. However, none of cdrom, isofs, and squashfs are present in that list at that point.

These lines are present in dmesg logs of both working (5.18 and before) and failing (5.19+) images:

loop: module loaded
dracut: root was live:CDLABEL=VOID_LIVE, is now live:/dev/disk/by-label/VOID_LIVE

Manually mounting the ISO image correctly loads both cdrom and isofs, and the contents of the image are present at the mountpoint as expected. Further mounting the squashfs image also properly loads squashfs (again, contents present as expected).

However, after both mount operations, there isn't any info on the mounted filesystems in lsblk -f (fstype, fsver, Label!, UUID) ...until udevadm trigger is manually run. This kernel commit seems potentially relevant as is would introduce delays before being able to see the label after mounting the ISO https://github.com/torvalds/linux/commit/498ef5c777d9c89693b70cc453b40c392120ea1b. I will be testing if adding a delay after the mount in dracut fixes the issue.

If you have further insights, I'll test those too

Note: if using a fedora image for testing, the kernel and initrd location in the ISO seem to have changed since the last time dracut.cmdline was modified. They are now present in /images/pxeboot/.

0x5c commented 1 year ago

@LaszloGombos

One thing I forgot to mention in the previous message, is that the ISO image does (sometimes?*) stay mounted when dropping in the shell, but in the mounted-with-no-label state.

Since last message, I've also found a kinda-fix: From a boot attempt where the mount was already present while in the shell, simply running udevadm trigger and leaving the shell lead to dracut successfully booting into the Void live image.

*kinda confused as to how it shows as mounted in some attempts while I recall other attempts not even having the loop module loaded in the shell (the issue is either a separate one in dracut/the images, or in my recollection of all of these attempts at booting)

LaszloGombos commented 1 year ago

@0x5c

Please try to autoload modules that this use case needs from the bootloader command line arguments - e.g. "rd.driver.pre= loop,cdrom,isofs, squashfs"

You could also try this patch that debian carries: https://salsa.debian.org/debian/dracut/-/blob/master/debian/patches/udevsettle

LaszloGombos commented 1 year ago

Fedora bug report - https://bugzilla.redhat.com/show_bug.cgi?id=2131852

0x5c commented 1 year ago

A fix for this has been merged upstream https://github.com/dracutdevs/dracut/pull/2196, and there's a backport of it to the package https://github.com/void-linux/void-packages/pull/42265 Once that's merged any new image shouldn't have problems with iso-scan anymore.