intel / Intel-Linux-Processor-Microcode-Data-Files

Other
650 stars 70 forks source link

intel-microcode update 3.20201110.0ubuntu0.20.10.1 causes hang on boot for TGL-UP3 #44

Open superm1 opened 3 years ago

superm1 commented 3 years ago

Downstream bug report: https://bugs.launchpad.net/ubuntu/+source/intel-microcode/+bug/1903883

XPS 9310 is hanging on black screen with corrupted artifacts. In fallback kernel looking at log this error mentioned: 1

[ 1.286211] kernel: [Hardware Error]: event severity: fatal
[ 1.286217] kernel: [Hardware Error]: Error 0, type: fatal
[ 1.286218] kernel: [Hardware Error]: section_type: Firmware Error Record Reference
[ 1.286218] kernel: [Hardware Error]: Firmware Error Record Type: SOC Firmware Error Record Type2
[ 1.286219] kernel: [Hardware Error]: Revision: 2
[ 1.286220] kernel: [Hardware Error]: Record Identifier: 8f87f311-c998-4d9e-a0c4-6065518c4f6d

[ 1.286365] kernel: [Hardware Error]: Error 1, type: fatal
[ 1.286365] kernel: [Hardware Error]: section_type: Firmware Error Record Reference
[ 1.286366] kernel: [Hardware Error]: Firmware Error Record Type: SOC Firmware Error Record Type2
[ 1.286366] kernel: [Hardware Error]: Revision: 2
[ 1.286367] kernel: [Hardware Error]: Record Identifier: 8f87f311-c998-4d9e-a0c4-6065518c4f6d
superm1 commented 3 years ago

/proc/cpuinfo mentioned:

processor   : 7
vendor_id   : GenuineIntel
cpu family  : 6
model       : 140
model name  : 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
stepping    : 1
microcode   : 0x52
cpu MHz     : 4260.939
cache size  : 12288 KB
physical id : 0
siblings    : 8
core id     : 3
cpu cores   : 4
apicid      : 7
initial apicid  : 7
fpu     : yes
fpu_exception   : yes
cpuid level : 27
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l2 invpcid_single cdp_l2 ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves split_lock_detect dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid movdiri movdir64b fsrm avx512_vp2intersect md_clear flush_l1d arch_capabilities
vmx flags   : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple pml ept_mode_based_exec tsc_scaling
bugs        : spectre_v1 spectre_v2 spec_store_bypass swapgs
bogomips    : 5606.40
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
superm1 commented 3 years ago

On my side I reproduce it with this CPU

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 140
model name      : 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
stepping        : 1
microcode       : 0x52
cpu MHz         : 1101.432
cache size      : 8192 KB
physical id     : 0
siblings        : 8
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 27
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l2 invpcid_single cdp_l2 ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid movdiri movdir64b fsrm avx512_vp2intersect md_clear flush_l1d arch_capabilities
vmx flags       : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple pml ept_mode_based_exec tsc_scaling
bugs            : spectre_v1 spectre_v2 spec_store_bypass swapgs
bogomips        : 4838.40
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

and see the following in kern.log from a previous failed boot:

Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747002] [Hardware Error]: event severity: fatal
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747003] [Hardware Error]:  Error 0, type: fatal
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747004] [Hardware Error]:   section type: unknown, 81212a96-09ed-4996-9471-8d729c8e69ed
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747004] [Hardware Error]:   section length: 0x220
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747006] [Hardware Error]:   00000000: 00000202 00000000 00000000 00000000  ................
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747007] [Hardware Error]:   00000010: 8f87f311 4d9ec998 6560c4a0 6d4f8c51  .......M..`eQ.Om
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747007] [Hardware Error]:   00000020: 1100a101 00000080 00000000 fe013df4  .............=..
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747008] [Hardware Error]:   00000030: 00000000 0000200f 1d65a21d 0000c000  ..... ....e.....
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747008] [Hardware Error]:   00000040: 00000000 07c40800 7cfa7038 00000111  ........8p.|....
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747009] [Hardware Error]:   00000050: 0500020e 1d6a30d7 0000c000 00000000  .....0j.........
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747009] [Hardware Error]:   00000060: 07000000 00787038 00000211 0500020e  ....8px.........
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747010] [Hardware Error]:   00000070: 1d6a33e3 0000c000 00000000 06000000  .3j.............
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747011] [Hardware Error]:   00000080: 3c5a7030 00000311 0500020e 1d6a36ec  0pZ<.........6j.
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747011] [Hardware Error]:   00000090: 0000200c 1d6a5420 0000200d 1d748275  . .. Tj.. ..u.t.
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747012] [Hardware Error]:   000000a0: 00002012 1d74c20d 00002013 1d7554dc  . ....t.. ...Tu.
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747012] [Hardware Error]:   000000b0: 00002624 2cc9e546 00002606 2cc9e6c0  $&..F..,.&.....,
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747013] [Hardware Error]:   000000c0: 0000260e 2cc9e7cd 0000260f 2cc9e8b9  .&.....,.&.....,
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747013] [Hardware Error]:   000000d0: 0000260e 2cc9e9b9 0000260f 2cc9eaa5  .&.....,.&.....,
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747014] [Hardware Error]:   000000e0: 00002610 2cc9ebb5 0000260e 2cc9eda8  .&.....,.&.....,
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747014] [Hardware Error]:   000000f0: 0000260f 2cc9ee94 0000260e 2cc9ef94  .&.....,.&.....,
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747015] [Hardware Error]:   00000100: 0000260f 2cc9f080 00002610 2cc9f190  .&.....,.&.....,
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747015] [Hardware Error]:   00000110: 0000260e 2cc9f383 0000260f 2cc9f46f  .&.....,.&..o..,
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747016] [Hardware Error]:   00000120: 0000260e 2cc9f56f 0000260f 2cc9f67e  .&..o..,.&..~..,
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747016] [Hardware Error]:   00000130: 00002610 2cc9f78e 00002614 2cc9f9e1  .&.....,.&.....,
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747017] [Hardware Error]:   00000140: 00002615 2cc9fc20 06000500 00002435  .&.. ..,....5$..
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747017] [Hardware Error]:   00000150: 02002616 2cc9fdb2 00002618 2cc9ff64  .&.....,.&..d..,
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747018] [Hardware Error]:   00000160: 00002619 2ccaa7a1 0000261a 2ccaa8c4  .&.....,.&.....,
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747018] [Hardware Error]:   00000170: 0000261b 2ccac292 06000500 00002435  .&.....,....5$..
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747019] [Hardware Error]:   00000180: 0200261c 2ccac3f5 00000000 00000001  .&.....,........
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747019] [Hardware Error]:   00000190: 02002622 2ccac69d 00000001 00000000  "&.....,........
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747020] [Hardware Error]:   000001a0: 02002623 2ccadc58 06000500 00002435  #&..X..,....5$..
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747020] [Hardware Error]:   000001b0: 0200261d 2ccaded0 06000500 00002435  .&.....,....5$..
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747021] [Hardware Error]:   000001c0: 02002617 2ccae042 0000260e 2ccae1c2  .&..B..,.&.....,
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747021] [Hardware Error]:   000001d0: 0000260f 2ccae2ae 0000260e 2ccae3ae  .&.....,.&.....,
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747022] [Hardware Error]:   000001e0: 0000260f 2ccae49a 00002610 2ccae5aa  .&.....,.&.....,
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747022] [Hardware Error]:   000001f0: 00002607 2ccae782 00002625 2ccae891  .&.....,%&.....,
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747023] [Hardware Error]:   00000200: 00000002 01002000 0aebb14d 00002014  ..... ..M.... ..
Nov 11 14:03:45 XPS-13-9310 kernel: [    0.747023] [Hardware Error]:   00000210: 0aebb299 1d6308d4 0000200e 1d65a0bd  ......c.. ....e.
Gochkin commented 3 years ago

I have a similar issue on Dell XPS 9310 with Manjaro. After choosing the loading the kernel with ucode.img path parameter the computer hangs with similar artifacts as in the picture above. I can load the laptop by removing the path to ucode.img but in this case, I have very heavy screen tearing.

Let me know if you want me provide some additional technical info.

esyr-rh commented 3 years ago

On Thu, Nov 12, 2020 at 03:13:17AM -0800, Yegor Yegorov wrote:

Let me know if you want me provide some additional technical info.

Are there records similar to the aforementioned kern.log excerpts above, by chance? Thank you.

jaypz commented 3 years ago

Original downstream reporter here; here's the full section of my kern.log with the reported errors - Thanks!

https://gist.github.com/jaypz/eb73d1166e1d715c04312601e8c4997f

schnz commented 3 years ago

I can confirm the exact same behavior as reported by @superm1. Loading the ramdisk file intel-ucode.img results in a scrumbled screen and notebook restarts after a couple of seconds. Everything worked well with intel-ucode-20200616. Only with intel-ucode-2020111 it has this behavior.

Probably an issue with the XPS 9310. I've got this model as well. Here is my crash log:

https://gist.github.com/Coksnuss/7501d5662ad756908d4923ff3361ef1a

// Edit: For what its worth: I use an Arch Linux and this is the change that broke the package: https://github.com/archlinux/svntogit-packages/commit/f6f49a156d5657177998b3c444581d0376f66192

superm1 commented 3 years ago

Probably an issue with the XPS 9310. I've got this model as well.

Although the initial reports are from XPS 9310, I don't think this proves it's specific to the 9310. There are not many other machines with TGL-UP3 in the marketplace, especially those sold with Linux (at least yet).

bionade24 commented 3 years ago

Does anyone additionally got flickering with linux 5.9.8? I had to go back to 5.9.6 to fix this.

schnz commented 3 years ago

Does anyone additionally got flickering with linux 5.9.8? I had to go back to 5.9.6 to fix this.

Doublecheck whether you rolled back just the kernel or also the CPU microcode (aka intel-ucode).

This issue is definitely unrelated to the kernel. In an attempt to figure out what the problem is, I rolled back the kernel from 5.9.8 to 5.9.4 (which I knew was working a few days ago). It had no effect. I narrowed it down to the init-ucode.img being loaded during the boot process (which happens prior to the loading of the kernel).

bionade24 commented 3 years ago

Doublecheck whether you rolled back just the kernel or also the CPU microcode (aka intel-ucode).

I first only downgraded the microcode with Arch Archive. I could boot again, but flickering remained. Or have I somehow not triggered the ucode image?

hmh commented 3 years ago

Try using "iucode_tool -tr -Sl" on the init-ucode.img to see what is inside. That would tell you whether it was updated or not.

You can also try a cold boot (power cycle).

bionade24 commented 3 years ago

Try using "iucode_tool -tr -Sl" on the init-ucode.img to see what is inside. That would tell you whether it was updated or not.

You can also try a cold boot (power cycle).

I can't bc I already went back. Thx anyway.

baron-digit commented 3 years ago

I have the same issue since yesterday. Arch Forum EDIT: I downgraded to 20200616 and it works perfectly. sudo pacman -U /var/cache/pacman/pkg/intel-ucode-20200616-1-any.pkg.tar.zst

hmh commented 3 years ago

FYI, Ubuntu removed the TGL-UP3 update from their packages due to this regression. For Debian, I released the 20201110 update without TGL-UP3 update with a rather explicit note in the Debian changelog pointing to this report.

esyr-rh commented 3 years ago

Can confirm TGL-Y early microcode update to revision 0x68 hangs on the following system: bios_date 07/30/2020
bios_vendor Intel Corporation bios_version TGLSFWI1.R00.3313.A02.2007300246 board_asset_tag Base Board Asset Tag board_name TigerLake Y LPDDR4x T4 Crb board_serial FZTL936000B5 board_vendor Intel Corporation board_version 2
chassis_asset_tag Chassis Asset Tag chassis_serial Chassis Serial Number chassis_type 9
chassis_vendor Intel Corporation chassis_version 0.1 product_family Tiger Lake Client System product_name Tiger Lake Client Platform product_serial FZTL936000B5 product_sku 0001100100070100 product_uuid 4c545a46-3339-3056-b030-4235465a544c product_version 0.1 sys_vendor Intel Corporation Firmware microcode version is 0x52.

mcu-administrator commented 3 years ago

This microcode update has been removed until the issue has been resolved.

esyr-rh commented 3 years ago

New revision 0x88 of 06-8c-01 microcode file has been published as part of microcode-20210608 release, it may be worth trying out.

esyr-rh commented 3 years ago

Can confirm that the system from [1] successfully updates to revision 0x88 of the 06-8c-01 microcode.

[1] https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/issues/44#issuecomment-729888969

ryantig commented 5 months ago

Is there a reason the issue hasn't been marked Closed?

Karmavil commented 3 months ago

Is there a reason the issue hasn't been marked Closed?

I got an update today that mentions this issue in the changelog a long time ago Debian bookworm

paulmenzel commented 3 months ago

@Karmavil, please share the version of the package intel-microcode and link to the change-log. The entry in Debian change-log is from intel-microcode 3.20201110.1 from Nov 12th, 2020.

Karmavil commented 3 months ago

That's correct. The update I got today is intel-microcode (3.20240514.1~deb12u1) bookworm; urgency=medium and scrolling down you will find that entry as you said (intel-microcode (3.20201110.1) unstable; urgency=medium). I was just wondering why is this issue still open if there was a fix

EDIT: OK I get it now, it took me some extra-time but I finally got it INTEL-SA-00381 AND INTEL-SA-00389 MITIGATIONS ARE THEREFORE NOT INSTALLED That's why. I thought those were models