intel / Intel-Linux-Processor-Microcode-Data-Files

Other
620 stars 68 forks source link

microcode-20200609 Release, at least 06-4e-03, hangs user's system #31

Open vicamo opened 4 years ago

vicamo commented 4 years ago

Per reports from https://bugs.launchpad.net/ubuntu/+source/intel-microcode/+bug/1882890, sig 0x000406e3, pf_mask 0xc0, 2020-04-27, rev 0x00dc, size 104448 hangs user's system in a similar way as #24 does but to different cpu. It's also reported the same file of a previous revision microcode: sig=0x406e3, pf=0x80, revision=0xd6 works just fine.

andyhhp commented 4 years ago

The differences between 0xd6 and 0xdc were quite minimal, to work around an SGX issue.

Does disabling SGX in the BIOS work around the hang using 0xdc?

eguaj commented 4 years ago

As requested by @vicamo, here is the output from lscpu:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    2
Core(s) per socket:    2
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 78
Model name:            Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz
Stepping:              3
CPU MHz:               700.042
CPU max MHz:           2800.0000
CPU min MHz:           400.0000
BogoMIPS:              4800.00
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              3072K
NUMA node0 CPU(s):     0-3
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d

It's on a Lenovo T460 laptop.

eguaj commented 4 years ago

The differences between 0xd6 and 0xdc were quite minimal, to work around an SGX issue.

Does disabling SGX in the BIOS work around the hang using 0xdc?

Disabling SGX does not seems to help: SGX was initially on "Software controlled", so changed it to "Disabled", but laptop still hangs when booting the new initrd image.

andyhhp commented 4 years ago

Thanks for testing. I think this needs Intel to investigate now

pjssilva commented 4 years ago

Same thing on my system with a similar CPU. I had to uninstall the intel-microcode package to boot. here is the output of lscpu:


Arquitetura:                     x86_64
Modo(s) operacional da CPU:      32-bit, 64-bit
Ordem dos bytes:                 Little Endian
Address sizes:                   39 bits physical, 48 bits virtual
CPU(s):                          4
Lista de CPU(s) on-line:         0-3
Thread(s) per núcleo:            2
Núcleo(s) por soquete:           2
Soquete(s):                      1
Nó(s) de NUMA:                   1
ID de fornecedor:                GenuineIntel
Família da CPU:                  6
Modelo:                          78
Nome do modelo:                  Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz
Step:                            3
CPU MHz:                         500.006
CPU MHz máx.:                    3000,0000
CPU MHz mín.:                    400,0000
BogoMIPS:                        4999.90
Virtualização:                   VT-x
cache de L1d:                    64 KiB
cache de L1i:                    64 KiB
cache de L2:                     512 KiB
cache de L3:                     3 MiB
CPU(s) de nó0 NUMA:              0-3
Vulnerability Itlb multihit:     KVM: Mitigation: Split huge pages
Vulnerability L1tf:              Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling
Vulnerability Srbds:             Vulnerable: No microcode
Vulnerability Tsx async abort:   Mitigation; Clear CPU buffers; SMT vulnerable
Opções:                          fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pn
                                 i pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb sti
                                 bp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp m
                                 d_clear flush_l1d
``
esyr-rh commented 4 years ago

On Wed, Jun 10, 2020 at 06:51:03AM -0700, Paulo J. S. Silva wrote:

Same thing on my system with a similar CPU. I had to uninstall the intel-microcode package to boot.

Have you tried using revision 0xd6 (from [1], for example) or 0xda (from [2], for example) of the microcode?

[1] https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/raw/437f382b1be4412b9d03e2bbdcda46d83d581242/intel-ucode/06-4e-03 [2] https://github.com/platomav/CPUMicrocodes/raw/master/Intel/cpu406E3_platC0_ver000000DA_2020-01-09_PRD_1527A73F.bin

pjssilva commented 4 years ago

I used the Ubuntu package intel-microcode (3.20200609.0ubuntu0.20.04.0). I don't have it in my system anymore to check out. From the description of the package in https://launchpad.net/ubuntu/+source/intel-microcode/3.20200609.0ubuntu0.20.04.0 I guess 0x00d6. Do you want me to reinstall it and check?

esyr-rh commented 4 years ago

On Wed, Jun 10, 2020 at 07:55:50AM -0700, Paulo J. S. Silva wrote:

I used the Ubuntu package intel-microcode (3.20200609.0ubuntu0.20.04.0). I don't have it in my system anymore to check out. From the description of the package in https://launchpad.net/ubuntu/+source/intel-microcode/3.20200609.0ubuntu0.20.04.0 I guess 0x00d6. Do you want me to reinstall it and check?

3.20200609.0ubuntu0.20.04.0 has revision 0xdc of 06-4e-03 microcode file, 3.20191115.1ubuntu3 has 0xd6.

pjssilva commented 4 years ago

So it is 0xdc (I found the reference in the middle of the description). As I said I can test what ever you want. Just point me to the file and instructions on how to apply it without an Ubuntu package. Thanks!

hmh commented 4 years ago

Did you also update the Linux kernel? It would be nice to know if the regression is related to the microcode update itself, or to the combination of the two...

pjssilva commented 4 years ago

It looks from my /var/log/apt/history.log that it also update the linux-image package to 5.4.0-37. This is the kernel I am using right now, it did not need to uninstall it. I only uninstalled intel-microcode and its dependencies (meta packages that install the latest kernel and friends). Both were unattended upgrades (because they were marked as a security update).

hmh commented 4 years ago

Would it trouble you too much to test the intel-microcode package with an older kernel? With an old kernel, the microcode enables SRDBS mitigation in transparent mode, it is supposed to "just work" even if it might end up being enabled in a configuration that does not need it...

This ensures the issue is not being caused by some bug in the new kernel code.

hmh commented 4 years ago

If you do test with an older kernel, please look in /proc/cpuinfo to ensure you are indeed running microcode revision 0xdc, i.e. the new one for your processor.

pjssilva commented 4 years ago

You mean try to use a mainline kernel? What version? I can use mainline PPA to do it easily. How do I install the Intel microcode without a deb package. Just point me to the docs, I can manage myself.

esyr-rh commented 4 years ago

On Wed, Jun 10, 2020 at 08:04:22AM -0700, Paulo J. S. Silva wrote:

So it is 0xdc (I found the reference in the middle of the description). As I said I can test what ever you want. Just point me to the file and instructions on how to apply it without an Ubuntu package.

Please try out [1] for 0xd6 revision. For 0xda revision, you can replace "/lib/firmware/intel-ucode/06-4e-03" with [2] (don't forget to do backup the original version) and re-generate initramfs with "update-initramfs -u" command.

Note that one can always avoid early microcode loading during boot by supplying "dis_ucode_ldr" kernel commandline option.

Thank you.

[1] https://launchpad.net/ubuntu/focal/amd64/intel-microcode/3.20191115.1ubuntu3 [2] https://github.com/platomav/CPUMicrocodes/raw/master/Intel/cpu406E3_platC0_ver000000DA_2020-01-09_PRD_1527A73F.bin

hmh commented 4 years ago

@pjssilva: no, please use any older Ubuntu kernel that works well in your machine for the purposes of this testing.

hmh commented 4 years ago

You can also use Debian's intel-microcode packages, they should be compatible enough with Ubuntu.

here: http://deb.debian.org/debian/pool/non-free/i/intel-microcode/

You can use the ones without any "~deb" on their name.

stevebeattie commented 4 years ago

@pjssilva FYI I'm preparing packages with a revert to version d6 for the 06-4e-03, they should be available for testing from https://launchpad.net/~sbeattie/+archive/ubuntu/lp1882890/ shortly. I would appreciate confirmation that they resolve the issue for you before publishing to the ubuntu archive.

Also, the previous ubuntu kernel in 20.04 is 5.4.0-33, should be available to install via sudo apt install linux-image-5.4.0-33-generic linux-modules-5.4.0-33-generic linux-modules-extra-5.4.0-33-generic linux-headers-5.4.0-33-generic.

stevebeattie commented 4 years ago

Also, as a datapoint, we did successfully test the combintation of the 0xdc microcode + a srbds aware kernel on a system with an i5-6200U before publishing the 20200609 update.

pjssilva commented 4 years ago

I am in the middle of my testing spree. What I learned so far:

0xd6 seems to work always.

As for 0xda:

With 5.4.0-37-generic: it fails a lot, but not always. I managed to boot into gnome after booting first in rescue mode (to see if the right). Some time it boots OK, but most time not. It might be easier to boot after a full shutdown (instead of a reboot).

With 5.4.0-33-generic: it seems to fail less, but it fails. Usually stuck after "loading initial ramdisk" (or somthing like this). I have the feeling that it does not fail if I do a full shutdown first. I will try that now (3 ful lshutdowns in a row).

pjssilva commented 4 years ago

0xda did not pass the 3 cold boot test with 5.4.0-33-generic. So I can say that boot hangs with both 5.4.0-33 and 5.0.4-37 hot and cold boots. I don't see a pattern.

@stevebeattie I will try the packages in your ppa now. I will let you know what I find out.

Malizor commented 4 years ago

Hi,

Same issue here, and I confirm the fix/rollback proposed by @stevebeattie does solve the problem.

My hardware: Thinkpad T460S with i5-6200U microcode: sig=0x406e3, pf=0x80, revision=0xd6 Running Ubuntu 20.04 (when the original microcode was installed, boot hanged regardless of the previous Kernel I tried)

esyr-rh commented 4 years ago

On Wed, Jun 10, 2020 at 10:23:38AM -0700, Paulo J. S. Silva wrote:

I am in the middle of my testing spree. What I learned so far:

0xd6 seems to work always.

As for 0xda:

With 5.4.0-37-generic: it fails a lot, but not always. I managed to boot into gnome after booting first in rescue mode (to see if the right). Some time it boots OK, but most time not. It might be easier to boot after a full shutdown (instead of a reboot).

With 5.4.0-33-generic: it seems to fail less, but it fails. Usually stuck after "loading initial ramdisk" (or somthing like this). I have the feeling that it does not fail if I do a full shutdown first. I will try that now (3 ful lshutdowns in a row).

Thank you very much for the testing!

pjssilva commented 4 years ago

@stevebeattie, as @Malizor I confirm that your package solves the problem.

hmh commented 4 years ago

Well, looks like we (distros) will be forced to (re-)publish the security fix without 06-4e-03 :-(

hmh commented 4 years ago

Just noticed 0x506e3 might have the same problem, but we got no reports (either positive or negative) thus far.

rokezu commented 4 years ago

Just wanted to leave a report that on an Asus UX305-CA with an Intel Core m3-6y30 (sig=0x406e3), using Arch Linux and after just updating intel-ucode package, the system will refuse to boot, I had to edit the grub entry at boot time to remove the intel-ucode line from initrd and it boots again. Made a full system upgrade running 5.6.15-arch1-1 kernel. This seems like a QA issue at Intel?

mirekingr commented 4 years ago

Same issue here, boot freezes on Ubuntu 20.04/focal

Dell Latitude 5591,
Intel Core i7-8850H, sig=0x000906ea
intel-microcode_3.20200609.0ubuntu0.20.04.2
esyr-rh commented 4 years ago

On Thu, Jun 11, 2020 at 09:49:51AM -0700, Mirek Ingr wrote:

Intel Core i7-8850H, sig=0x000906ea

This is a different CPUID, probably it should be reorted separately.

hmh commented 4 years ago

@mirekingr: that's a different issue, and it is not new to 20200609 either if I got things right from Ubuntu's bug tracking system.

esyr-rh commented 4 years ago

On Thu, Jun 11, 2020 at 10:52:31AM -0700, Henrique de Moraes Holschuh wrote:

it is not new to 20200609 either if I got things right from Ubuntu's bug tracking system.

Are you referring to [1], or something else?

[1] https://bugs.launchpad.net/ubuntu/+source/intel-microcode/+bug/1862751

stevebeattie commented 4 years ago

Mirek's bug report is https://bugs.launchpad.net/ubuntu/+source/intel-microcode/+bug/1882943

stevebeattie commented 4 years ago

Just to keep this issue up to date, in Ubuntu, we reverted this specific microcode from the updates we pushed to our users.

hmh commented 4 years ago

So did Debian: we have not included the 0x406e3 microcode update in our 20200609 intel-microcode security update.

paulmenzel commented 4 years ago

Does booting Linux with maxcpus=1 get you a non-crashing system?

Since upgrading the microcode updates, there were crashes too.

  1. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1883065
  2. https://lkml.org/lkml/2020/6/11/474
sthen commented 4 years ago

Data point: I'm running OpenBSD on 06-4e-03 with the new microcode and am not seeing crashes/hangs. I am using multiple cores but am not using SMT.

fdutheil commented 4 years ago

Hi, adding reference to the Gentoo ticket tracking the same issue with rev 0xda: https://bugs.gentoo.org/722768

msmeissn commented 4 years ago

https://bugzilla.suse.com/show_bug.cgi?id=1172856 also one report at SUSE Dell Inc. Latitude E7470 Laptop

Foxboron commented 4 years ago

Arch Linux downstream bug report; https://bugs.archlinux.org/task/66978

whpenner commented 4 years ago

We are debugging the issue and will provide an update when we have more information.

kenchin80 commented 4 years ago

Arch Linux downstream bug report: https://bugs.archlinux.org/task/66988 Dell Latitude E7470 laptop

whpenner commented 4 years ago

Intel identified an issue when OS loading microcode update revision 0xDC for cpuid 406E3 and 506E3. The microcode update has been reverted to revision 0xD6. This issue does not affect the microcode update when loaded from BIOS.

superm1 commented 4 years ago

@whpenner if it doesn't happen when loaded from BIOS, does that mean it is actually a timing problem with the early microcode loader implementation in the kernel?

whpenner commented 4 years ago

@superm1 The issue I mention above is unrelated to the microcode loader implementation (early or late).

superm1 commented 4 years ago

OK thanks for your confirmation.

jirireischig commented 4 years ago

Same problem on my system with Xeon CPU. Here is the output of lscpu:

Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 94 Model name: Intel(R) Xeon(R) CPU E3-1270 v5 @ 3.60GHz Stepping: 3 CPU MHz: 3790.283 CPU max MHz: 4000.0000 CPU min MHz: 800.0000 BogoMIPS: 7200.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 8192K NUMA node0 CPU(s): 0-7 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb invpcid_single intel_pt ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear spec_ctrl intel_stibp flush_l1d

paulmenzel commented 3 years ago

What is the status? The Ubuntu report says Fixed Released.

superm1 commented 3 years ago

What is the status? The Ubuntu report says Fixed Released.

Ubuntu reverted it, if you look at end of that bug report you linked.

paulmenzel commented 3 years ago

What is the status? The Ubuntu report says Fixed Released.

Ubuntu reverted it, if you look at end of that bug report you linked.

Yes, but so did upstream in release 20200616, didn’t they? So, should this issue be closed too?

hmh commented 3 years ago

Please don't.

IMHO, we need to track somehow the faulty/regression-inducing microcode updates that need further revisions from Intel to be usable by operating systems. That is what makes sense to track as github issues here.

IMO, downgrading the microcode revision in the "update package" is worth of a note that it has been worked around -- by reverting to an older microcode and thus reintroducing all supposedly fixed issues, including security issues -- here. Not closing the bug.