intel / Intel-Linux-Processor-Microcode-Data-Files

Other
668 stars 71 forks source link

Hard lockups using microcode releases 20191115, sig=0x806ec, on i5-8365U #24

Closed vicamo closed 3 years ago

vicamo commented 4 years ago
microcode: sig=0x806ec, pf=0x80, revision=0xc6 # good
microcode: sig=0x806ec, pf=0x80, revision=0xca # hang

This hangs Linux warm/cold boot for all versions with non-uniformed fail rate. With v5.2 or older, it's almost 100% reproducible; with v5.3 or above, it's much less likely. With kernel boot parameters initcall_debug ignore_loglevel=1 earlyprintk=efi earlycon=efifb, it shows it's mostly locked up inside lock access in console_init(), but it may also sometimes happen when early console is not even initialized. On Ubuntu Bionic, the first known failing version of intel-microcode package is 3.20191115.1ubuntu0.18.04.1. By reverting this package to any prior revision and such hard lockup is then gone. Another possible temporary work-around is to disable SMP by passing nosmp to kernel.

This is currently reproducible on Intel i5-8365U, model 142, stepping 12.

vicamo commented 4 years ago

Also reported to Ubuntu Launchpad: https://bugs.launchpad.net/bugs/1862751

esyr-rh commented 4 years ago

(from the linked launchpad bug) dmi.chassis.vendor: Dell Inc. dmi.product.name: Latitude 5410 dmi.product.sku: 09C9 dmi.bios.date: 01/29/2020 dmi.bios.vendor: Dell Inc. dmi.bios.version: 0.0.7

vicamo commented 4 years ago

Copy from launchpad: reverting intel-ucode/06-8e-0c to its prior revision 0xC6 would fix this issue.

anthonywong commented 4 years ago

@intel folks Just wondering if this issue has been received and acknowledged, and if a microcode fix will be coming?

whpenner commented 4 years ago

We are looking into the issue. I'll send out an update as soon as I have more.

hmh commented 4 years ago

Another similar issue has been closed, with the update from 0xca to 0xd6 present in the newest release (20200609).

@vicamo, can you please test if microcode revision 0xd6 (present on microcode 20200609) fixes your issue?

paulmenzel commented 4 years ago

Instead of nosmp, maxcpus=1 or acpi=off or noapic worked for me in another case.

vicamo commented 4 years ago

@vicamo, can you please test if microcode revision 0xd6 (present on microcode 20200609) fixes your issue?

This can still be reproduced with 20200609 release.

microcode: microcode updated early to revision 0xd6, date = 2020-04-23
microcode: sig=0x806ec, pf=0x80, revision=0xd6
microcode: Microcode Update Driver: v2.2.
hmh commented 4 years ago

There might be a separate issue on 20200609.

report on Debian downstream: https://bugs.debian.org/962757

From Debian butg report: 2019115-20200520 works (rev 0xca), which is the opposite of the data on this report. 20200609 (rev 0xd6) hangs on boot.

There might be multiple issues causing hangs with recent Linux kernels and microcode 0x806ec.

esyr-rh commented 4 years ago

There's also an observation that disabling Turbo helps with hangs during microcode update to revision 0xca on another Dell system (Dell M7720) with a 06-9e-0a/0x906ea CPU (Intel(R) Xeon(R) E-2186M): https://bugzilla.redhat.com/show_bug.cgi?id=1846097

whpenner commented 4 years ago

An issue was identified with the microcode update 0xCA for cpuid 0x806Ex and 906Ex products. Please update to 0xD6 or later.

paulmenzel commented 4 years ago

An issue was identified with the microcode update 0xCA for cpuid 0x806Ex and 906Ex products. Please update to 0xD6 or later.

Where can 0xD6 be found?

esyr-rh commented 4 years ago

On Tue, Jun 16, 2020 at 09:58:06AM -0700, Paul Menzel wrote:

An issue was identified with the microcode update 0xCA for cpuid 0x806Ex and 906Ex products. Please update to 0xD6 or later.

Where can 0xD6 be found?

Either in the previous microcode-20200520 release[1][2][3] or the current microcode-20200616 release[4][5][6].

[1] https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/releases/tag/microcode-20200520 [2] https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/raw/microcode-20200520/intel-ucode/06-4e-03 [3] https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/raw/microcode-20200520/intel-ucode/06-5e-03 [4] https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/releases/tag/microcode-20200616 [5] https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/raw/microcode-20200616/intel-ucode/06-4e-03 [6] https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/raw/microcode-20200616/intel-ucode/06-5e-03

esyr-rh commented 4 years ago

On Tue, Jun 16, 2020 at 09:58:06AM -0700, Paul Menzel wrote:

An issue was identified with the microcode update 0xCA for cpuid 0x806Ex and 906Ex products. Please update to 0xD6 or later.

Where can 0xD6 be found?

Ooops, sorry, it's about 06-[89]e-0X and not 06-4e-03/06-5e-03, these are available in microcode-20200609 onwards[0][1][2][3][4][5][6][7][8][9].

[0] https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/releases/tag/microcode-20200609 [1] https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/raw/microcode-20200609/intel-ucode/06-8e-09 [2] https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/raw/microcode-20200609/intel-ucode/06-8e-0a [3] https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/raw/microcode-20200609/intel-ucode/06-8e-0b [4] https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/raw/microcode-20200609/intel-ucode/06-8e-0c [5] https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/raw/microcode-20200609/intel-ucode/06-9e-09 [6] https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/raw/microcode-20200609/intel-ucode/06-9e-0a [7] https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/raw/microcode-20200609/intel-ucode/06-9e-0b [8] https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/raw/microcode-20200609/intel-ucode/06-9e-0c [9] https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/raw/microcode-20200609/intel-ucode/06-9e-0d

hmh commented 4 years ago

@whpenner: we have reports of update 0xd6 for signature 0x806ec also causing hangs, here: https://bugs.debian.org/962757

Should we open a separate issue ?

stevebeattie commented 4 years ago

@hmh I think so, I also have a privately received report of a failure with 0x806ec / WHL-U i5-8365U and 0xd6.

whpenner commented 4 years ago

Yes, we should keep the 806Ex and 906Ex items separate from SKL, so a new issue would be better for tracking.

esyr-rh commented 4 years ago

06-8e-0c microcode has been updated to revision 0xde in the microcode-20201110 release, does the newer microcode revision help?

hmh commented 3 years ago

@vicamo, can you help us check if the microcode update in the latest microcode release fixes this issue ?