siderolabs / talos

Talos Linux is a modern Linux distribution built for Kubernetes.
https://www.talos.dev
Mozilla Public License 2.0
6.89k stars 555 forks source link

Talos 1.8.0+ initial boot fails in phase meta (6/12) #9776

Open smauermann opened 17 hours ago

smauermann commented 17 hours ago

Bug Report

Hi team, I am failing to boot any Talos v1.8.0+ ISO from USB on an HP Elitedesk 800 g5 Mini (i5 9500T with vPro). The boot process fails in the meta phase at step 6/12.

I would really love to switch to Talos but I am at a loss right now on how to proceed. I would love any hints!

Description

Up until and including v1.7.7 I can boot just fine and apply the machine configs to have a function Kubernetes cluster. However, if I try to boot any Talos version greater than v1.8.0, the boot fails. Please see a screenshot of the failed boot process below.

I have tried various other images to verify nothing is wrong with the node: Debian, Proxmox, CoreOS, and of course different Talos version.

Besides the different versions, I have played with different extensions namely intel-ucode, i915-ucode, mei, utils-linux including all permutations of the selected extensions.

Interestingly, I was able to creat a cluster with 1.7.7 and upgrade to 1.8.0. Another upgrade to 1.8.3 (no extensions) failed, though. The screen just went black after the reboot and never came up. I am trying this now again with different extensions.

EDIT: I was able to upgrade from 1.7.7 without any extensions to 1.8.3 including the following extensions:

customization:
    systemExtensions:
        officialExtensions:
            - siderolabs/i915-ucode
            - siderolabs/intel-ucode
            - siderolabs/mei
            - siderolabs/util-linux-tools

Logs

"Screenshot" of the boot failure: talos-boot-fail

Environment

smira commented 17 hours ago

So there might be a mix of several issues here, with Talos 1.8 there's unfortunate side-effect for those having i915 - the i915-ucode should be included, otherwise the Linux kernel fails to boot (it will be fixed for 1.9+).

As for the error above in the screenshot, it is certainly a bug, but I don't understand how it ends up with way.

Does the disk contain any previous Talos install when booting from an ISO (USB)?

smira commented 17 hours ago

Oh yeah, I misread the picture. I guess you might have META partition somewhere on the disk.

Moreover, it might be related to incomplete wipe of the system disk. Please try to wipe the disks before installation.

smauermann commented 17 hours ago

Hi @smira, thanks for your swift reply! I did shred both internal disks before installing Talos and I performed a wipe via the disks machine config during the install of 1.7.7. I was pretty sure that I nuked everything before the installation. Is there any way of checking for the existence of META extraneous partitions?

smauermann commented 17 hours ago

Also, I'm happy to hear that the i915 issue will be fixed with the next minor version. Keep up the great work.

smira commented 17 hours ago

I don't see the logs, but I wonder if there's a message somewhere up from the VolumeManager controller about META partition being found (it shouldn't be).

smauermann commented 16 hours ago

I did not observe such a message, but then again the logs fly past pretty quickly and all I could capture is in the screenshot above 😄

smira commented 15 hours ago

One of the options is to record a video, it sometimes allows to see individual messages.

smauermann commented 12 hours ago

Would a talosctl reset get rid of any META partitions that could mess with any subsequent installs?