intel / Intel-Linux-Processor-Microcode-Data-Files

Other
637 stars 70 forks source link

Hard lockups using microcode releases 20191115 and 20191112 #23

Closed jariruusu closed 4 years ago

jariruusu commented 4 years ago

Laptop computer: Dell Latitude 5580, i5-7200U CPU @ 2.50GHz Kaby Lake Mobile, id 806E9, family 6 model 142 stepping 9 pf 0x80 Latest BIOS from Dell, 1.16.0 07/03/2019 Linux kernel version 4.14.164 (stable mainline + local patches) BIOS microcode revision 0xb4 is stable, no issues observed. Microcode release 20191112, revision 0xc6 is unstable, hard lockups. Microcode release 20191115, revision 0xca is unstable, hard lockups.

hmh commented 4 years ago

Hard lockups when?

jariruusu commented 4 years ago

No issues on booting, cold or warm. Hard lockups after 1-2 days of normal use.

ashok-raj commented 4 years ago

Could you also add /proc/cpuinfo from the system? Is it possible to try with a more upstream kernel? Do you suspend/resume atall during the 2 days? or its operating non-stop. Just want to understand if there is something else that could be triggering it.

jariruusu commented 4 years ago

Could you also add /proc/cpuinfo from the system?

More information here: http://www.elisanet.fi/jariruusu/916/issue23-1.tar.gz

Is it possible to try with a more upstream kernel?

That is possible, but the problem here is unstable microcode.

Do you suspend/resume atall during the 2 days?

No suspend/resume, it is disabled in my kernel configuration.

Dell BIOS folks may be able provide more information. Dell being significant customer of Intel, it may be that Intel employees have contacts there.

I have learned that Dell removes BIOS versions from their download pages if Dell learns that a version of BIOS has severe stability issues. They don't remove those that have exploitable vulnerabilities, just those that crash or lock up unexpectedly.

I have archived download links to old BIOS versions of this computer. I just checked if the links still work. Yes, old stable ones are still available, unstable ones are gone. One removed unstable example is BIOS version 1.8.1 (microcode revision 0x7C) that attempted spectre_v2 mitigations, but resulted in programs crashing. Later microcode versions were better.

Dell has released 3 BIOS versions using same version 1.16.0 number. The dates below are "published date" not BIOS build date.

2019-07-01 BIOS 1.15.1 has microcode 0xb4, stable, available 2019-07-24 BIOS 1.16.0 has unknown microcode version, NOT AVALABLE 2019-09-20 BIOS 1.16.0 has unknown microcode version, NOT AVALABLE 2019-10-22 BIOS 1.16.0 has microcode 0xb4, stable, available

So, Dell pulled the plug on two of them, presumably because they were unstable. Is it cosmic coincidence that there are two newer microcode revisions (0xc6 and 0xca) released after that stable revision 0xb4 for this processor and that I have tested and found both of them unstable?

I have archived these BIOS updates. Here are the MD5 sums for them:

d9218217b691872852a65fd37d55406e Latitude_5X80_Precision_3520_1.16.0.exe 33b161caa01e70863c74f5af64947647 Latitude_5X80_Precision_3520_1.16.0.exe 1e28074ecdb5863df309ad9feadaeae1 Latitude_5X80_Precision_3520_1.16.0.exe

I only installed the last of these, the one which has microcode 0xb4.

Would it be possible for some Intel employee to contact Dell BIOS folks and ask them why Dell decided to pull the plug on those two BIOS versions.

jariruusu commented 4 years ago

For few weeks I have been using microcode revision 0xd2 from https://github.com/platomav/CPUMicrocodes on my laptop computer (sig 0x806E9 fam 6 model 142 step 9 pf 0x80). That microcode has stable, no issues observed. Now that Intel has released newer microcode 0xd6 for that processor, this
stability issue with older microcodes 0xc6 and 0xca can be closed.