intel / Intel-Linux-Processor-Microcode-Data-Files

Other
668 stars 71 forks source link

Are any of the fixes for RPL-E/HX/S in release 2024-10-29 relevant for O.S. loading? #87

Closed hmh closed 1 week ago

hmh commented 3 weeks ago

Intel has been extremely clear that erratum RPL061 (the internal voltage request issue) can only be fixed through a firmware update since it must be loaded from FIT to be effective.

So, this question is about the other functional fixes present in 0x12b (presumably for errata RPL059, RPL060 ?). Are they relevant for operating-system loading of microcode updates, or do they also require that the MCU be loaded from firmware ?

teoberi commented 3 weeks ago

Dell has updated the microcode in the latest BIOS version. I'm starting to lose faith in microcode updates loaded at the operating system level. I would like Intel to explain this clearly so that we know how to proceed.

whpenner commented 2 weeks ago

@hmh @teoberi, Hi

Let me see if I can clear this up a bit with a bit of background.

Microcode Updates (MCUs) are a cumulative set of changes to settings, processor microcode, and may also contain other firmware (FW) updates used by the processor. What is addressed by the MCU does vary between processors and generally depends on the processor generation and the OEM/ODM or market needs for that processor.

The other source for firmware updates is the Integrated Firmware Image (IFWI) stored on the main board. Over time, the location for a specific FW element may move from the MCU to the IFWI or vice versa. Since the source of the various FW elements can vary between products, the ability and characteristics of the update process will vary.

Another factor has to do with the specific hardware (HW) design of the feature being updated. Some updates may be a simple change in a HW setting, a change in the behavior of microcode or FW, or, as in this case, HW may need to reset using updated settings to determine and configure runtime values. This is one reason for loading the MCU at reset (called FIT loading). More detail can be found here: https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/secure-coding/loading-microcode-os.html

The value of loading the MCU from the OS depends on which MCU was loaded from the IFWI at FIT. Loading a newer MCU from the OS is intended to work around issues as best it can, but for reasons I stated above, it may not be possible to work around all known issues. So, this comes down to a tradeoff for the user between a solution that may not be complete, but is easy to implement, versus an update that may require specific knowledge or effort to perform but is a more complete workaround. Furthermore, not all OEMs regularly update their IFWI, so the OS loaded MCU may be the only option for those systems.

For the Intel® Core™ 13th and 14th Gen desktop processor Vmin Shift Instability issue, the changes required were complex, requiring changes in both the MCU and IFWI and requiring HW to reset using updated settings, thus not something that was possible with an OS loaded MCU update. Intel has been clear that an IFWI update was required for this issue. While the OS loaded MCU was not able to address this specific issue, it has been valuable for the many other issues resolved using that method.

teoberi commented 2 weeks ago

Much clearer, if Intel specifies the recommended load method for each microcode update everything is OK (it can also be an additional column in the Release Notes tables).

teoberi commented 2 weeks ago

@whpenner -> Who is right or doesn't know yet that he is not? https://www.phoronix.com/news/Intel-RPL-Microcode-Voltage https://www.tomshardware.com/pc-components/cpus/intel-rolls-out-linux-kernel-microcode-fix-for-affected-13th-14th-generation-processors This strengthens my conviction that what I wrote in the previous post (https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/issues/87#issuecomment-2456405977) would help a lot if it were put into practice!

whpenner commented 2 weeks ago

@teoberi, @hmh: Oh, I completely missed hmh's question. RPL059 and RPL060 both indicate a Workaround of 'None identified', so those issues do not have a workaround in the MCU. I will check to see if there are any other workarounds in the MCU, but I wasn't aware of anything else in 0x12b over 0x129.

teoberi commented 2 weeks ago

@whpenner is the information presented in Michael Larabel's article (phoronix.com) correct?

Intel off their typical second Tuesday of the month patch regiment today posted new CPU microcode just for 13th Gen "Raptor Lake" and 14th Gen "Raptor Lake Refresh" processors for Linux systems. Notable with the updated Raptor Lake CPU microcode is the internal voltage handling fix for that well known problem plaguing many Raptor Lake owners plus two other fixes. Intel this evening published their Intel CPU Microcode 20241029 release for Linux users which just contains updated Raptor Lake 13th/14th Gen CPU microcode.

As part of motherboard BIOS updates the Intel CPU microcode update has already been rolling out that way but for Linux users who don't routinely update their BIOS but allow for Linux package updates with Intel CPU microcode included, you can now have the correction this way -- assuming your CPU isn't already negatively impacted by this bug, in which case an RMA is likely necessary.

The updated Intel Raptor Lake CPU microcode for Linux users to apply via the boot process or late loading can be obtained via GitHub.

How is this to be understood with what you explained in the post here?

hmh commented 1 week ago

@teoberi, @hmh: Oh, I completely missed hmh's question. RPL059 and RPL060 both indicate a Workaround of 'None identified', so those issues do not have a workaround in the MCU. I will check to see if there are any other workarounds in the MCU, but I wasn't aware of anything else in 0x12b over 0x129.

Thanks for the information!

hmh commented 1 week ago

Let me see if I can clear this up a bit with a bit of background.

[...]

Thank you for the detailed explanation!

BTW: I suggest including some of it in the microcode README, for far better discoverability than a github issue (which I am about to close...

hmh commented 1 week ago

@whpenner is the information presented in Michael Larabel's article (phoronix.com) correct?

It is poorly worded. You need a firmware update to get any of the "fix the voltage issue" benefits from either 0x129 or 0x12b, and that's it. And if you're going to do a firmware update, you want 0x12b or newer (didn't look at 20241112 yet to check if there's a newer one).