siderolabs / talos

Talos Linux is a modern Linux distribution built for Kubernetes.
https://www.talos.dev
Mozilla Public License 2.0
6.39k stars 514 forks source link

Stuck at efi stub: loaded initrd from LINUX_EFI_INITRD_MEDIA_GUID device path #8657

Closed muvaf closed 1 week ago

muvaf commented 4 months ago

Bug Report

Description

I have tried two methods to install Talos OS to a bare metal instance in Hetzner (not Hetzner Cloud):

In both cases, when it boots, it's stuck at black screen with the following message: EFI stub: Loaded initrd from LINUX_EFI_INITRD_MEDIA_GUID device path

All talosctl commands time out.

Logs

EFI stub: Loaded initrd from LINUX_EFI_INITRD_MEDIA_GUID device path

One thing I noticed, when I go to rescue mode and open the disk I dd'ed Talos raw image, I see the following:

GPT PMBR size mismatch (181931 != 1000215215) will be corrected by write.
The backup GPT table is not on the end of the device. This problem will be corrected by write.
The device contains 'iso9660' signature and it will be removed by a write command. See fdisk(8) man page and --wipe option for more details.

Environment

muvaf commented 4 months ago

I've been digging this log line and different distributions had their own entries but some of them were resolved by setting kernel cmdline parameter console. I'm not sure if the exact parameter can be the same for everyone but I've resolved my issue by going into rescue mode on my machine, query for ttys and their baud rate, and then generate an image with Image Factory with console=tty0,38400. Dumped more detailed steps here.

Other OSes I installed to my machine didn't struggle with figuring that out. Could it be the bootloader Talos uses is not configured to detect and set this parameter? netboot.xyz also seems to instruct manually setting it in some cases.

smira commented 4 months ago

You can try removing all console= args at the GRUB boot prompt by editing the command line.

muvaf commented 4 months ago

@smira I don't have access to GRUB boot prompt unless I get Hetzner to attach a KVM device to the server. In my second try, having just console=tty0 worked as well. I'd like to update this doc with this information but the doc is tailored for Hetzner Cloud which is completely separate from Hetzner Robot dedicated server service where there isn't even a snapshotting machinery. You just write the metal-amd64.iso to the whole disk.

aarnaud commented 4 months ago

Same issue with agent-amd64 booting on a qemu/kvm VM over PXE with UEFI with sidero 0.6.4. Screenshot from 2024-05-06 11-02-08

Rollback the sidero-controller-manager to 0.6.3 fix my issue for the agent-amd64

aarnaud commented 4 months ago

I got it on Serial Port !

image

aarnaud commented 4 months ago

Fix for me switching pc-q35 to pc-i440fx on kvm/qemu, it's seem the kernel may miss some device support/drivers