intel / Intel-Linux-Processor-Microcode-Data-Files

Other
649 stars 71 forks source link

System hangs using microcode 06-a7-01 version 20231114 #70

Closed bkuhls closed 1 month ago

bkuhls commented 1 year ago

With version 20230512 everything is fine but using the latest version 20230808 the system hangs during boot.

# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 167
model name      : 11th Gen Intel(R) Core(TM) i9-11900T @ 1.50GHz
stepping        : 1
microcode       : 0x58
# sha256sum 06-a7-01
10d28bd43d05150c036f9b74dd8a471316cab4069479171732cfa0e5bc417a01  06-a7-01

# ./iucode_tool --write-earlyfw=intel-microcode.img 06-a7-01
./iucode_tool: Writing selected microcodes to: intel-microcode.img

# sha256sum intel-microcode.img
8cf50bf3dfe2434c9ec598c3dd5ed596dc40c23449451002826b1b33390e0e47  intel-microcode.img

The img file is used with syslinux

APPEND load_ramdisk=1 initrd=../../intel-microcode.img,../../rootfs.img

which works fine with the previous microcode

microcode: sig=0xa0671, pf=0x2, revision=0x58
microcode: Microcode Update Driver: v2.2.

but with the current microcode on screen only three lines appear about loading intel-microcode.img, rootfs.img and the Linux kernel (5.14.21) itself, no messages from the kernel itself appear, the system is frozen.

alexmurray commented 1 year ago

Whilst not identical, as a data-point, I have tested the same microcode on a very similar machine (but running the older 5.8.0-43-generic kernel from Ubuntu 20.04) and it appeared to boot without issue (or at least enough that I could remotely log in via SSH and capture /proc/cpuinfo - I don't have physical access to this machine unfortunately nor can I easily test other newer kernel versions unfortunately):

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 167
model name  : 11th Gen Intel(R) Core(TM) i9-11900 @ 2.50GHz
stepping    : 1
microcode   : 0x59
ashok-raj commented 1 year ago

Can you first boot with cmdline dis_ucode_ldr. This will disable early loading. Then report what version comes from BIOS?

You can also recreate an initrd without the suspect ucode, then try to late-load it via

# echo 1 > /sys/devices/system/cpu/microcode/reload

And report back.

bkuhls commented 1 year ago

Can you first boot with cmdline dis_ucode_ldr. This will disable early loading. Then report what version comes from BIOS?

# grep microcode /proc/cpuinfo | uniq
microcode       : 0x58

This result occurs after a cold boot, board is an ASUS Z590 Extreme with BIOS v2.20 dating 2023/4/7. I updated the BIOS in May, before BIOS v1.90, dating 2021/7/29, was installed. Since May 2022, when I built the machine, I updated the microcode starting with version 20220510, and used all five revisions since then, and never had any problems.

You can also recreate an initrd without the suspect ucode, then try to late-load it via

# echo 1 > /sys/devices/system/cpu/microcode/reload

I need to recompile my self-compiled kernel for this, the subdirectory microcode/ is not present atm, and will report back.

Since I am chain-loading two initrd files, one of them containing the microcode, using syslinux there is no need to recreate the initrd, I just obmit the microcode image from the kernel cmdline.

EDIT: A reboot without dis_ucode_ldr was enough ;) Here is the result of the microcode reload:

kern.info kernel: microcode: updated to revision 0x59, date = 2023-02-26
kern.info kernel: microcode: Reload completed, microcode revision: 0x59

Approx. 10s later the machine freezes without any messages on the terminal (headless server here) or in syslog.

bkuhls commented 10 months ago

The issue still exists with version 20231114.

ashok-raj commented 10 months ago

Thanks.. I'll have someone look into that... I didn't fully understand the linked initrd.. Are you saying that you have two entries. one with the ucode in initrd and another without the ucode?

So after the late-load, you noticed system updated from 0x58->0x59.. but system froze after that. Did I understand it correctly

bkuhls commented 10 months ago

I didn't fully understand the linked initrd.. Are you saying that you have two entries. one with the ucode in initrd and another without the ucode?

The bootloader contains one entry consisting of two initrd files which are being chain-loaded. The first initrd file is created by iucode_tool --write-earlyfw, the second initrd file contains the linux system.

So after the late-load, you noticed system updated from 0x58->0x59.. but system froze after that. Did I understand it correctly

Yes.

bkuhls commented 1 month ago

The bug is fixed with version 20240813, thanks!

# dmesg | grep sig=
microcode: sig=0xa0671, pf=0x2, revision=0x62

# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 167
model name      : 11th Gen Intel(R) Core(TM) i9-11900T @ 1.50GHz
stepping        : 1
microcode       : 0x62

It seems my machine suffered from erratum TGL068 "Processor May Hang During a Microcode Update"

ashok-raj commented 1 month ago

Thanks for the update!

hmh commented 1 month ago

Uh, TGL068 == ADL075 == ICL088. Hopefully all of them are fixed already in 20240813...

whpenner commented 1 month ago

Uh, TGL068 == ADL075 == ICL088. Hopefully all of them are fixed already in 20240813...

Yes, that is correct. The changes for this issue were included in the May and Aug updates.