siderolabs / talos

Talos Linux is a modern Linux distribution built for Kubernetes.
https://www.talos.dev
Mozilla Public License 2.0
6.5k stars 519 forks source link

Can't cold boot Talos on a T2 Intel Mac #7066

Open dhess opened 1 year ago

dhess commented 1 year ago

Bug Report

Using a USB boot drive, and having disabled the system's Apple-signed boot restrictions, I'm able to install Talos v1.3.7 on a T2 Intel Mac (iMac Pro (2017), in this specific case) just fine, and everything at runtime works as expected. kexec warm reboots also work fine. But as soon as I cold boot the machine, I get the firmware's "missing boot folder" start-up screen. Holding the Option key during boot doesn't show the EFI partition as a choice, either — the only option I get is to boot into Internet recovery mode.

rEFInd is a common solution to Linux booting issues on T2 Intel Macs, but even booting rEFInd from a USB drive or SD card doesn't give an option to boot Talos. rEFInd doesn't seem to recognize the EFI partition.

Booting into the Mac's recovery mode and running refind-install also doesn't work as expected. In this case, rEFInd's installer recognizes the EFI partition and tries to mount it (via mount -t msdos), but this fails with a cryptic Invalid argument message.

Is there something unusual about Talos's EFI partition?

One workaround is to boot from the USB drive that I used to install Talos, which seems to recognize the installed image and uses its config, but that isn't a good long-term solution since it would require updating the drive every time I also want to update the installed version of Talos.

Note that there was a similar issue reported in the Sidero Labs Community Slack last month, this time for a 2018 Mac mini: https://taloscommunity.slack.com/archives/CMARMBC4E/p1678790269057759

Description

Logs

Environment

smira commented 1 year ago

We don't have any specific answer, you might need to dig into it. My only guess it's something with the way GRUB installs and sets up things.

dhess commented 1 year ago

I've returned to this and while playing around a bit, I've discovered that Disk Utility in recovery mode claims that the EFI partition is too small, and that the disk needs to be repartitioned. Is there an easy way to change the EFI partition size used by the Talos installer without patching the source and rebuilding from scratch?

smira commented 1 year ago

No, EFI size is hardcoded in Talos

dhess commented 1 year ago

Confirmed that bumping the EFI partition size to 200MiB fixes the boot issue with Intel Macs. I'll make a PR. (Edit: #7132)

tommy-skaug commented 11 months ago

Bumping this as the workaround in the former PR doesn't work on v1.5.4 for me (if I'm not noobing about something with the build). Was there any progress made in regard to the mentioned 1.5.0 work?

$ make imager TAG=v1.5.4-mac PLATFORM=linux/amd64 PUSH=true IMAGE_REGISTRY=127.0.0.1:5005 && make image-metal TAG=v1.5.4-mac PLATFORM=linux/amd64 IMAGE_REGISTRY=127.0.0.1:5005
[...]
◳ creating disk image...
Error: failed to install: failed to partition device: requested partition size 104857600, available is 45071360 (59786240 too many bytes)
make: *** [image-metal] Error 1

I recognize that it isn't desirable to provide a specific solution for individual platforms, but having the partition sizes as an option in the config would be a quite general option to provide that would probably be useful on more platforms as well.

smira commented 11 months ago

with the partition size increase, your image size is bigger than the disk created, you need to update disk size in the pkg/imager/profile/s/default.go.

tommy-skaug commented 11 months ago

That works, thank you. Increasing the EFI partition to 500MB and moving to the installer medium (from raw to ISO) makes Mac Mini 2013-2018 (latter is T2) boot Talos nicely.

buroa commented 10 months ago

Ah, I have the exact same problem in my setup. I use my laptop (Thunderbolt <-> Mac Mini) to reformat the EFI partition and it boots that way. To do rolling upgrades, I have to use --preserve=true otherwise it wipes my slightly custom? EFI partition.

If increasing the EFI partition is the way to make this work, that's great news.

FWIW: I maintain a custom Talos installer that includes applesmc so you can cool it down (otherwise, it will overheat) here: https://ghcr.io/buroa/installer generated from https://github.com/buroa/talos-boot-assets

tommy-skaug commented 10 months ago

@buroa your repo was actually my starting point (so kudos for putting that great learning resource out there!) before I started looking into why Talos didn't play nicely with the Mac Mini. I booted my custom installer (seems we all need one for now) off a stick with the mentioned changes and it installs nicely for me now at least.

Attaching the patch in case it can be of help to anyone else running custom installers.

gh-siderolabs-talos-7066.patch

buroa commented 9 months ago

@buroa your repo was actually my starting point (so kudos for putting that great learning resource out there!) before I started looking into why Talos didn't play nicely with the Mac Mini. I booted my custom installer (seems we all need one for now) off a stick with the mentioned changes and it installs nicely for me now at least.

Attaching the patch in case it can be of help to anyone else running custom installers.

gh-siderolabs-talos-7066.patch

I added that patch into my next installer container build (https://github.com/buroa/talos-boot-assets/commit/883555b727ba0c66bf9d7b25c3ed55caafaacc81), so we should be able to natively boot and install Talos now :-)

aarnaud commented 8 months ago

Hi, On my side I discovered that the EFI partition that working on Mac mini 2018, it's the same generated by Talos.

Mac mini

> minfo -i /dev/nvme0n1p1 
Hidden (2048) does not match sectors (256)
device information:
===================
filename="/dev/nvme0n1p1"
sectors per track: 32
heads: 16
cylinders: 50

media byte: f8

mformat command line:
  mformat -t 50 -h 16 -s 32 -S 5 -r 4 -c 1 -m 248 -i "/dev/nvme0n1p1" ::

bootsector information
======================
banner:"BSD  4.4"
sector size: 4096 bytes
cluster size: 1 sectors
reserved (boot) sectors: 1
fats: 2
max available root directory slots: 512
small size: 25600 sectors
media descriptor byte: 0xf8
sectors per fat: 13
sectors per track: 32
heads: 16
hidden sectors: 256
physical drive id: 0x80
reserved=0x0
dos4=0x29
serial number: 58451DFB
disk label="EFI        "
disk type="FAT16   "

Talos:

> minfo -i /dev/nvme0n1p1
device information:
===================
filename="/dev/nvme0n1p1"
sectors per track: 32
heads: 64
cylinders: 100

media byte: f8

mformat command line:
  mformat -t 100 -h 64 -s 32 -r 0 -c 1 -m 248 -i "/dev/nvme0n1p1" ::

bootsector information
======================
banner:"mkfs.fat"
sector size: 512 bytes
cluster size: 1 sectors
reserved (boot) sectors: 32
fats: 2
max available root directory slots: 0
small size: 0 sectors
media descriptor byte: 0xf8
sectors per fat: 0
sectors per track: 32
heads: 64
hidden sectors: 2048
big size: 204800 sectors
physical drive id: 0x80
reserved=0x1
dos4=0x29
serial number: 6F88DD3A
disk label="EFI        "
disk type="FAT32   "
Big fatlen=1576
Extended flags=0x0000
FS version=0x0000
rootCluster=2
infoSector location=1
backup boot sector=6

Infosector:
signature=0x41615252
free clusters=201341
last allocated cluster=276

Maybe a change to FAT16 maybe fix this issue

buroa commented 8 months ago

Hi, On my side I discovered that the EFI partition that working on Mac mini 2018, it's the same generated by Talos.

  • For mac mini: FAT16
  • Generated by Talos: FAT32

Mac mini

> minfo -i /dev/nvme0n1p1 
Hidden (2048) does not match sectors (256)
device information:
===================
filename="/dev/nvme0n1p1"
sectors per track: 32
heads: 16
cylinders: 50

media byte: f8

mformat command line:
  mformat -t 50 -h 16 -s 32 -S 5 -r 4 -c 1 -m 248 -i "/dev/nvme0n1p1" ::

bootsector information
======================
banner:"BSD  4.4"
sector size: 4096 bytes
cluster size: 1 sectors
reserved (boot) sectors: 1
fats: 2
max available root directory slots: 512
small size: 25600 sectors
media descriptor byte: 0xf8
sectors per fat: 13
sectors per track: 32
heads: 16
hidden sectors: 256
physical drive id: 0x80
reserved=0x0
dos4=0x29
serial number: 58451DFB
disk label="EFI        "
disk type="FAT16   "

Talos:

> minfo -i /dev/nvme0n1p1
device information:
===================
filename="/dev/nvme0n1p1"
sectors per track: 32
heads: 64
cylinders: 100

media byte: f8

mformat command line:
  mformat -t 100 -h 64 -s 32 -r 0 -c 1 -m 248 -i "/dev/nvme0n1p1" ::

bootsector information
======================
banner:"mkfs.fat"
sector size: 512 bytes
cluster size: 1 sectors
reserved (boot) sectors: 32
fats: 2
max available root directory slots: 0
small size: 0 sectors
media descriptor byte: 0xf8
sectors per fat: 0
sectors per track: 32
heads: 64
hidden sectors: 2048
big size: 204800 sectors
physical drive id: 0x80
reserved=0x1
dos4=0x29
serial number: 6F88DD3A
disk label="EFI        "
disk type="FAT32   "
Big fatlen=1576
Extended flags=0x0000
FS version=0x0000
rootCluster=2
infoSector location=1
backup boot sector=6

Infosector:
signature=0x41615252
free clusters=201341
last allocated cluster=276

Maybe a change to FAT16 maybe fix this issue

@aarnaud It's the size of the partition that matters. Talos is too small for the EFI to be recognized on a Mac mini. My installer image has the fixes if you wish to use it: ghcr.io/buroa/installer:v1.6.1

aarnaud commented 8 months ago

Good to know, Since I changed to FAT16, using newfs_msdos from Mac Recovery, It's seem I don't have issue. But I will double check.

aarnaud commented 8 months ago

I didn't found if you patch applesmc to work with T2 chip, mac mini 2018 has issue with the applesmc present in the kernel

buroa commented 8 months ago

I didn't found if you patch applesmc to work with T2 chip, mac mini 2018 has issue with the applesmc present in the kernel

@aarnaud Every single patch is in my kernel that is shipped with my installer images. They are good to go.

See here for the builds: https://github.com/buroa/talos-boot-assets Kernel (always trailing latest version): https://github.com/buroa/talos-boot-assets/commit/523c5b09c8982247fdf6cd9c1da0ff609e86273c Talos (always trailing latest version): https://github.com/buroa/talos-boot-assets/commit/21960d728f03d17c3a94630902876909a8db575c

Using applesmc in my k8s-gitops repo: https://github.com/buroa/k8s-gitops/tree/master/kubernetes/apps/kube-system/mbpfan/app

aarnaud commented 8 months ago

I confirm for mac mini 2018, cold boot works with FAT16 and 100MB size

diff --git a/pkg/makefs/vfat.go b/pkg/makefs/vfat.go
index 18be20c12..b4d584c26 100644
--- a/pkg/makefs/vfat.go
+++ b/pkg/makefs/vfat.go
@@ -15,7 +15,7 @@ func VFAT(partname string, setters ...Option) error {
        args := []string{}

        if opts.Label != "" {
-               args = append(args, "-F", "32", "-n", opts.Label)
+               args = append(args, "-F", "16", "-n", opts.Label)
        }

        if opts.Reproducible {
aarnaud commented 8 months ago

https://superuser.com/questions/1702331/what-is-the-minimum-size-of-a-4k-native-partition-when-formatted-with-fat32 image

This why, so you fix is better

tommy-skaug commented 6 months ago

For the record I'll add another possibility for this happening (which I presume goes for other systems as well depending on how strict the system loader is).

On my disks I get the pmbr_boot flag set by the Talos partitioner (perhaps some of the BIOS compat flags in the config?). This makes the Mini refuse to boot from EFI.

The issue can be solved by setting the flag to off using something like parted on the system disk:parted --script /dev/nvme0n1 disk_set pmbr_boot off. In combination with the following there is no need for Refind in my experience (can be executed from an Alpine privileged container with access to /sys, meaning NVRAM):

mount -o remount,rw /sys/firmware/efi/efivars
efibootmgr --create --disk /dev/nvme0n1 --part 1 --loader "EFI\BOOT\BOOTX64.EFI" --label "Talos Linux" --unicode
mount -o remount,ro /sys/firmware/efi/efivars

(in the above nvme0n1 is the system disk)

github-actions[bot] commented 2 days ago

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be closed in 7 days.