siderolabs / talos

Talos Linux is a modern Linux distribution built for Kubernetes.
https://www.talos.dev
Mozilla Public License 2.0
6.06k stars 494 forks source link

Talos Linux not booting on Turing PI node 3 #7358

Open jlec opened 1 year ago

jlec commented 1 year ago

Bug Report

The node 3 is connect to an ASM1061 chip. This is apparent the reason the node is not booting. Most likely cause is the boot order which prefers the SATA disks over the eMMC.

Description

The Adapter and CM4 have been validated in other slots to work with the same Talos image to work and boot normally. Just in node 3 slot it won't come up. Other linux distros like Ubuntu work fine with the same board.

Logs

serial.log:

RPI Compute Module 4 (0xd03141)
Core:  209 devices, 16 uclasses, devicetree: board
MMC:   mmcnr@7e300000: 1, mmc@7e340000: 0
Loading Environment from FAT... Unable to read "uboot.env" from mmc0:1... 
In:    serial
Out:   vidconsole
Err:   vidconsole
Net:   eth0: ethernet@7d580000
PCIe BRCM: link up, 5.0 Gbps x1 (SSC)
"Error" handler, esr 0xbf000002
elr: 00000000000af544 lr : 00000000000af500 (reloc)
elr: 000000003df81544 lr : 000000003df81500
x0 : 000000000000dead x1 : 0000000000100000
x2 : 0000000000008000 x3 : 00000000fd508000
x4 : 0000000000000000 x5 : 0000000000000001
x6 : 000000003df82aac x7 : 000000003db40890
x8 : 0000000000008a6c x9 : 0000000000000008
x10: 000000003db4023c x11: 0000000000000002
x12: 0000000000000140 x13: 000000003db40228
x14: 000000003db40890 x15: 0000000000000000
x16: 000000003df82b84 x17: d4244e8100000000
x18: 000000003db4dd70 x19: 0000000000000001
x20: 000000003db40300 x21: 000000003db5b480
x22: 0000000000000000 x23: 0000000000010000
x24: 000000003dfc60a1 x25: 000000003db5b0b0
x26: 000000000000ffff x27: 0000000000000000
x28: 0000000000000000 x29: 000000003db40260

Code: 350001f3 f94017e0 39400000 92401c00 (d5033fbf) 
Resetting CPU ...

Environment

CFSworks commented 11 months ago

This is a bug with U-Boot, not Talos. The former is configuring an inappropriate power management mode for the CM4's PCI Express root controller, which causes the TP2's ASM1061 to drop off the bus, resulting in a CPU abort when U-Boot attempts to scan it.

I'll be sending a patch for this to U-Boot shortly. This issue can be closed once Talos begins using a fixed U-Boot.

maxromanovsky commented 9 months ago

@CFSworks could you add a link to the U-Boot PR here, so we can track that progress here as well?

CFSworks commented 9 months ago

Oh! Yes, certainly, here you go: pci: pcie-brcmstb: do not rely on CLKREQ# signal

It looks like it will first be included in v2024.01-rc1.

Note that a fixed U-Boot allows the boot to progress a little further and encounter a kernel panic from Linux's version of the same problem. There's a patch out there that works (everyone I've seen encounter that panic who then tried that patch reported success with it), but it apparently hasn't yet been accepted into the kernel.

SheGe commented 6 months ago

The new version of u-boot (v2024.01) has been released two weeks ago. Any ETA to have Talos released with it?

pl4nty commented 6 months ago

@SheGe I'm not sure about ETAs, but the new U-boot version requires lots of SBC compatibility changes so might be time-consuming to test. No idea whether the kernel patch would be available either

torvitas commented 5 months ago

The bug triggers also if you add a mini PCI-Express to Sata adapter to node 1 or 2.

tuxpeople commented 3 months ago

should this be fixed now with https://github.com/siderolabs/sbc-raspberrypi/pull/7 and is this merge already released?

smira commented 3 months ago

See https://www.talos.dev/v1.7/talos-guides/install/single-board-computers/rpi_generic/, try with latest version (v1.7.0-beta.0) at the moment.

torvitas commented 3 months ago

I just tried the latest version. I get past uboot and grub. But the linux kernel still seems to crash.

pl4nty commented 3 months ago

@torvitas the kernel patch hasn't been accepted yet, but v8 was close and v9 was submitted a few weeks ago. Talos uses LTS kernel versions, usually the last stable release in a year, so this might not be fixed for a while. It's planned to be fixed in the Turing Pi v2.5 though.

torvitas commented 3 months ago

As far as I understood the Turing Pi v2.5 is a different piece of hardware than the Turing Pi v2.

Is there a way to have custom kernel patches for the sbc images?

pl4nty commented 3 months ago

Yeah, v2.5 is a new hardware revision. Overlays don't include kernel patches, but the docs have a guide for customising the kernel. Then you can use an overlay with the custom imager to generate boot assets.

torvitas commented 2 months ago

I just want to report that it works. All three cm4 modules on my tp2 are up and running.

tuxpeople commented 1 month ago

I just want to report that it works. All three cm4 modules on my tp2 are up and running.

@torvitas Would you mind to quickly share how you did it? Last time I tried it, I wasn't successful.

torvitas commented 1 month ago

@tuxpeople - I basically did what @pl4nty suggested.

It was roughly like that: Clone the pkgs repo, add patches to kernel/prepare/patches, references these in kernel/pkg.yaml. Then follow https://www.talos.dev/v1.7/advanced/customizing-the-kernel/ .