intel / ipmctl

BSD 3-Clause "New" or "Revised" License
183 stars 62 forks source link

Create region configuration goal failed: Error 18 #180

Closed starcorn2020 closed 2 years ago

starcorn2020 commented 2 years ago

Hi guys,

I have issue during controlling PMem with ipmctl.

My environment, SUT: whitley princeton platform OS: CentOS8 CPU: Intel 8360Y Memory: 2 dimm, 2 PMem

The following words are error message.

  Create region configuration goal failed: Error 18 - No usable PMem modules due to all PMem modules being one of the 
  following:
  - PMem modules unmanageable
  - PMem modules non-functional
  - PMem modules have a population issue

Commands I tried,

ipmctl show -a -region

---ISetID=0x0000000000000000---
   SocketID=0x0000
   PersistentMemoryType=AppDirect
   Capacity=506.000 GiB
   FreeCapacity=506.000 GiB
   HealthState=Error
   DimmID=0x0011, 0x1211

ipmctl show -memoryresources

MemoryType   | DDR         | PMemModule  | Total
 Volatile     | 128.000 GiB | 0.000 GiB   | 128.000 GiB
 AppDirect    | -           | 0.000 GiB   | 0.000 GiB
 Cache        | 0.000 GiB   | -           | 0.000 GiB
 Inaccessible | 0.000 GiB   | 507.469 GiB | 507.469 GiB
 Physical     | 128.000 GiB | 507.469 GiB | 635.469 GiB

dmesg | grep -i "error"

[    2.846719] ERST: Error Record Serialization Table (ERST) support is initialized.
[    2.894295] tpm tpm0: tpm_try_transmit: send(): error -5
[    5.282783] nfit ACPI0012:00: Error found in NVDIMM nmem0 flags: map_fail
[    5.373521] nfit ACPI0012:00: Error found in NVDIMM nmem1 flags: map_fail

I think it's FW issue but no idea to solve

StevenPontsler commented 2 years ago

What model of PMems are installed in the system? What was the goal command you entered?

Please post the results of results of the following commands ipmctl show -a -dimm ipmctl show -topology

What is the history? It appears like you are trying to create a goal though it looks like all the PMem storage is already part of region and that region is in error. Have the dimms been moved since that region was created?

starcorn2020 commented 2 years ago
  1. PMem moudle is Intel® Optane™ Persistent Memory 200 Series

  2. I tried to create AD/Memory Mode in command after installed PMem memory. All of those command are same error

ipmctl create -goal MemoryMode=100 ipmctl create -goal PersistentMemoryType=AppDirect

  1. The result of command you offered in following. Pmem moudle is lended from warehouse, I don't know it's setting priviously

ipmctl show -a -dimm

---DimmID=0x0001---
   Capacity=253.734 GiB
   LockState=Disabled
   SVNDowngrade=Disabled
   SecureErasePolicy=No Master Passphrase
   S3ResumeOptIn=UnsecureS3
   FwActivateOptIn=Disabled
   HealthState=Healthy
   HealthStateReason=None
   FWVersion=02.01.00.1426
   FWAPIVersion=02.03
   FWActiveAPIVersion=02.03
   InterfaceFormatCode=0x0301 (Non-Energy Backed Byte Addressable)
   ManageabilityState=Manageable
   PopulationViolation=Yes
   PhysicalID=0x0026
   DimmHandle=0x0001
   DimmUID=8089-a2-2015-00002c89
   SocketID=0x0000
   MemControllerID=0x0000
   ChannelID=0x0000
   ChannelPos=1
   MemoryType=Logical Non-Volatile Device
   Manufacturer=Intel
   VendorID=0x8089
   DeviceID=0x0040
   RevisionID=0x0000
   SubsystemVendorID=0x8089
   SubsystemDeviceID=0x097b
   SubsystemRevisionID=0x0001
   DeviceLocator=CPU0_ChA_DIMM1
   ManufacturingInfoValid=1
   ManufacturingLocation=0xa2
   ManufacturingDate=20-15
   SerialNumber=0x00002c89
   PartNumber=NMB1XBD256GQS
   BankLabel=NODE 0
   DataWidth=64 b
   TotalWidth=72 b
   Speed=3200 MT/s
   FormFactor=DIMM
   ManufacturerID=0x8089
   ControllerRevisionID=A1, 0x0001
   MemoryCapacity=0.000 GiB
   AppDirectCapacity=0.000 GiB
   UnconfiguredCapacity=0.000 GiB
   InaccessibleCapacity=253.734 GiB
   ReservedCapacity=0.000 GiB
   PackageSparingCapable=1
   PackageSparingEnabled=1
   PackageSparesAvailable=1
   IsNew=0
   AveragePowerReportingTimeConstant=1000 ms
   ViralPolicy=0
   ViralState=0
   AvgPowerLimit=15000 mW
   MemoryBandwidthBoostFeature=0x1
   MemoryBandwidthBoostMaxPowerLimit=18000 mW
   MemoryBandwidthBoostAveragePowerTimeConstant=15000 ms
   MaxAveragePowerLimit=15000 mW
   MaxMemoryBandwidthBoostMaxPowerLimit=18000 mW
   MaxMemoryBandwidthBoostAveragePowerTimeConstant=120000 ms
   MemoryBandwidthBoostAveragePowerTimeConstantStep=1000 ms
   MaxAveragePowerReportingTimeConstant=12000 ms
   AveragePowerReportingTimeConstantStep=100 ms
   AveragePower=3861 mW
   Average12vPower=2616 mW
   Average1_2vPower=1245 mW
   LatchedLastShutdownStatus=PM ADR Command Received, DDRT Power Fail Command Received, PMIC 12V/DDRT 1.2V Power Loss (PLI), Controller's FW State Flush Complete, Write Data Flush Complete, Extended Flush Not Complete
   UnlatchedLastShutdownStatus=PMIC 12V/DDRT 1.2V Power Loss (PLI), Controller's FW State Flush Complete, Write Data Flush Complete, Extended Flush Not Complete
   ThermalThrottleLossPercent=0
   LastShutdownTime=Thu Nov 04 19:14:04 UTC 2021
   ModesSupported=Memory Mode, App Direct
   SecurityCapabilities=Encryption, Erase
   MasterPassphraseEnabled=0
   ConfigurationStatus=Failed - Unsupported
   SKUViolation=0
   ARSStatus=Completed
   OverwriteStatus=Unknown
   AitDramEnabled=1
   BootStatus=Success
   BootStatusRegister=0x00000004_981d00f0
   LatchSystemShutdownState=1
   PreviousPowerCycleLatchSystemShutdownState=0
   ExtendedAdrEnabled=0
   PpcExtendedAdrEnabled=0
   ErrorInjectionEnabled=0
   MediaTemperatureInjectionEnabled=0
   SoftwareTriggersEnabled=0
   SoftwareTriggersEnabledDetails=None
   PoisonErrorInjectionsCounter=0
   PoisonErrorClearCounter=0
   MediaTemperatureInjectionsCounter=0
   SoftwareTriggersCounter=0
   MaxControllerTemperature=60 C
   MaxMediaTemperature=63 C
   MixedSKU=0
---DimmID=0x1001---
   Capacity=253.734 GiB
   LockState=Disabled
   SVNDowngrade=Disabled
   SecureErasePolicy=No Master Passphrase
   S3ResumeOptIn=UnsecureS3
   FwActivateOptIn=Disabled
   HealthState=Healthy
   HealthStateReason=None
   FWVersion=02.01.00.1426
   FWAPIVersion=02.03
   FWActiveAPIVersion=02.03
   InterfaceFormatCode=0x0301 (Non-Energy Backed Byte Addressable)
   ManageabilityState=Manageable
   PopulationViolation=Yes
   PhysicalID=0x0036
   DimmHandle=0x1001
   DimmUID=8089-a2-2015-00002c58
   SocketID=0x0001
   MemControllerID=0x0000
   ChannelID=0x0000
   ChannelPos=1
   MemoryType=Logical Non-Volatile Device
   Manufacturer=Intel
   VendorID=0x8089
   DeviceID=0x0040
   RevisionID=0x0000
   SubsystemVendorID=0x8089
   SubsystemDeviceID=0x097b
   SubsystemRevisionID=0x0001
   DeviceLocator=CPU1_ChA_DIMM1
   ManufacturingInfoValid=1
   ManufacturingLocation=0xa2
   ManufacturingDate=20-15
   SerialNumber=0x00002c58
   PartNumber=NMB1XBD256GQS
   BankLabel=NODE 4
   DataWidth=64 b
   TotalWidth=72 b
   Speed=3200 MT/s
   FormFactor=DIMM
   ManufacturerID=0x8089
   ControllerRevisionID=A1, 0x0001
   MemoryCapacity=0.000 GiB
   AppDirectCapacity=0.000 GiB
   UnconfiguredCapacity=0.000 GiB
   InaccessibleCapacity=253.734 GiB
   ReservedCapacity=0.000 GiB
   PackageSparingCapable=1
   PackageSparingEnabled=1
   PackageSparesAvailable=1
   IsNew=0
   AveragePowerReportingTimeConstant=1000 ms
   ViralPolicy=0
   ViralState=0
   AvgPowerLimit=15000 mW
   MemoryBandwidthBoostFeature=0x1
   MemoryBandwidthBoostMaxPowerLimit=18000 mW
   MemoryBandwidthBoostAveragePowerTimeConstant=15000 ms
   MaxAveragePowerLimit=15000 mW
   MaxMemoryBandwidthBoostMaxPowerLimit=18000 mW
   MaxMemoryBandwidthBoostAveragePowerTimeConstant=120000 ms
   MemoryBandwidthBoostAveragePowerTimeConstantStep=1000 ms
   MaxAveragePowerReportingTimeConstant=12000 ms
   AveragePowerReportingTimeConstantStep=100 ms
   AveragePower=3663 mW
   Average12vPower=2418 mW
   Average1_2vPower=1245 mW
   LatchedLastShutdownStatus=PM ADR Command Received, DDRT Power Fail Command Received, PMIC 12V/DDRT 1.2V Power Loss (PLI), Controller's FW State Flush Complete, Write Data Flush Complete, Extended Flush Not Complete
   UnlatchedLastShutdownStatus=PMIC 12V/DDRT 1.2V Power Loss (PLI), Controller's FW State Flush Complete, Write Data Flush Complete, Extended Flush Not Complete
   ThermalThrottleLossPercent=0
   LastShutdownTime=Thu Nov 04 19:14:04 UTC 2021
   ModesSupported=Memory Mode, App Direct
   SecurityCapabilities=Encryption, Erase
   MasterPassphraseEnabled=0
   ConfigurationStatus=Failed - Unsupported
   SKUViolation=0
   ARSStatus=Completed
   OverwriteStatus=Unknown
   AitDramEnabled=1
   BootStatus=Success
   BootStatusRegister=0x00000004_981d00f0
   LatchSystemShutdownState=1
   PreviousPowerCycleLatchSystemShutdownState=0
   ExtendedAdrEnabled=0
   PpcExtendedAdrEnabled=0
   ErrorInjectionEnabled=0
   MediaTemperatureInjectionEnabled=0
   SoftwareTriggersEnabled=0
   SoftwareTriggersEnabledDetails=None
   PoisonErrorInjectionsCounter=0
   PoisonErrorClearCounter=0
   MediaTemperatureInjectionsCounter=0
   SoftwareTriggersCounter=0
   MaxControllerTemperature=57 C
   MaxMediaTemperature=62 C
   MixedSKU=0

ipmctl show -topology

 DimmID | MemoryType                  | Capacity    | PhysicalID| DeviceLocator 

 0x0001 | Logical Non-Volatile Device | 253.688 GiB | 0x0026    | CPU0_ChA_DIMM1
 0x1001 | Logical Non-Volatile Device | 253.688 GiB | 0x0036    | CPU1_ChA_DIMM1
 N/A    | DDR4                        | 64.000 GiB  | 0x0025    | CPU0_ChA_DIMM0
 N/A    | DDR4                        | 64.000 GiB  | 0x0035    | CPU1_ChA_DIMM0
nolanhergert commented 2 years ago

There is a chance that the BIOS is incorrectly marking the PMem modules as being in population violation. Please try this:

ipmctl delete -f -pcd
ipmctl create -f -goal (as normal)

The first will clear the PCD stored on the PMem module. They store goal configuration information, as well as indicate whether that module is in population violation. By clearing it, we give a "fresh start" for the goal configuration process.

The -f shouldn't be necessary on the goal creation command, but it's useful for brevity.

If there is no region created after rebooting, then please provide the output of ipmctl show -pcd. Thanks!

starcorn2020 commented 2 years ago

Hi nolanhergert and StevenPontsler,

PMem moudles work successfully after my folks help. It's happened because of invalid topology. Thx for your help.

BR, starcorn2020

StevenPontsler commented 2 years ago

Thanks for jumping in and helping Nolan.

@starcorn2020 - is there more or can the issue be closed?

starcorn2020 commented 2 years ago

@StevenPontsler Yes, tank you for your help.