siderolabs / talos

Talos Linux is a modern Linux distribution built for Kubernetes.
https://www.talos.dev
Mozilla Public License 2.0
6.69k stars 535 forks source link

Secure Boot Install fails with Specific Node #9368

Open TheoBassaw opened 3 weeks ago

TheoBassaw commented 3 weeks ago

Bug Report

Description

I have been trying to deploy Talos v1.80 (and v1.7.6 previously) on a couple of mini PCs for a test cluster. The hardware is two Lenovo M910x and an HP Prodesk 600 G2 Mini, running the latest bios updates. Installation on the Lenovo machines was easy enough. I enrolled the keys and enabled TPM encryption following the Secure Boot docs. The HP mini is the one with the problem

  1. Burned the secure boot iso via dd on a USB drive and tried to boot it on the HP Mini but it doesn't see it.
  2. As an alternative, load the ISO via Ventoy. It booted and enrolled the keys.
  3. Went through the installation and enabled TPM encryption of both State and Emphermal partitions. The machine proceeds to reboot.
  4. The machine reboots into a Secure Boot Violation screen. As a quick test, disabling Secure Boot allows the machine to boot. Re-enabling Secure Boot brings back the Violation screen.
  5. Re-enrolling the keys, allows it to boot but the State and Emphermal partitions can't unlock due to Seal Policy mismatch. (I'm confused why I need to even re-enroll the keys. It's as if they disappeared)

This is where I am.

Logs

I can't obtain logs since it can't fully boot until the partitions are unlocked. I can do camera pics if you are fine with that

talosctl dmesg --talosconfig=./talosconfig --nodes 10.20.30.5
1 error occurred:
 rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.20.30.5:50000: connect: connection refused"

Environment

smira commented 3 weeks ago

I don't have any specific answer, and I guess only vendor or someone else with similar hardware could help.

But there seems to be at least two separate issues:

TheoBassaw commented 3 weeks ago

I understand why TPM Encryption doesn't work. I don't understand why the SecureBoot Violation happens. I can only assume because of the use of Ventoy, but it wouldn't have been able to boot initially without it. Do you know why a created USB stick via DD failed to work? Kind of sucks I can't use the hardware I got.

smira commented 3 weeks ago

Ventoy shouldn't affect that, and I posted all the information above. SecureBoot violation is your UEFI firmware error, not Talos itself.

TheoBassaw commented 3 weeks ago

Alright, I may have to forgo this. I'm unsure what else to do since I'm on the latest bios.

vitaly-zverev commented 2 weeks ago

Alright, I may have to forgo this. I'm unsure what else to do since I'm on the latest bios.

you can try PXE as last resort:

https://gist.github.com/spipm/aef2db9b28d085b0c162d0b21afbe0f1

I can only assume because of the use of Ventoy, but it wouldn't have been able to boot initially without it.

probably, you already looked at this aspect, but couldn't aware of all details: https://www.reddit.com/r/sysadmin/comments/pl2jqg/creating_bootable_usb_drives_with_rufus_requires/