ibm-openbmc / dev

Product Development Project Mgmt and Tracking
16 stars 2 forks source link

use openpower-occ-control for OCC temps and power #2150

Closed spinler closed 3 years ago

spinler commented 4 years ago
  1. Install and start HwmonTempSensor and PSUSensor on Rainier
  2. Fill in the entity manage configs with the temp/power info so the apps will use them
  3. Remove any phosphor-hwmon config files we already have for those.

This would have to be done after all of the other dbus-sensors changes we need are done

spinler commented 3 years ago

I think with all the one off things we need to do for the OCC, OCC sensors should just be in a standalone dbus-sensors application.

It would need to do things like:

spinler commented 3 years ago

Will just use this for OCC dbus-sensors functionality. If we end up sticking with phosphor-hwmon, we can drop it.

spinler commented 3 years ago

If someone picks this up, there is a bit more research I'll need to do first, so let me know.

lxwinspur commented 3 years ago

@spinler IPS is interested to do this epic :)

I think the first thing is to switch to Entity-manager and Dbus-sensors repos. But IPS does not have Rainier machine, so how do we test it?

Also, Could you split this epic into multiple tasks? Thanks :)

spinler commented 3 years ago

We've actually changed positions on this a little bit, as this would still have us stuck hardcoding the 4 byte IDs that that OCC sends down in the poll response to know what the temperature values mean. The new idea is to just have hostboot use PLDM to send down those sensor IDs, and then modify the openpower-occ-control application, which already deals with PLDM calls for OCC active and also access the OCC sysfs, to start reading the sensors and putting them on D-Bus.

The issue is we haven't figured out model that in PLDM yet, though we have hostboot on board with the idea.

spinler commented 3 years ago

The plan now is that hostboot will put the PLDM entity ID and entity instance numbers into the 4 byte OCC ID fields for the temperature sensors, like 0xAAAABBBB.

DIMM sensors: entity ID 66 Core sensors: 135 The container ID for the DIMMs is 0x0 = system. For the cores, it is tentatively the container ID of the parent chip.

The OCC device driver makes the following hwmon sysfs files for each temperature:

tempX_input = temperature value, in degrees C tempX_label = The ID value tempX_fru_type: contains the FRU type field which is needed in the case of DIMMs to say which part of the DIMM the temp is for.

The possible fru types for the core and dimm sensors for P10, from the OCC spec: 0 = processor core 1 = internal mem ctlr 2 = DIMM 3 = mem ctlr + DIMM 7 = PMIC 8 = mem ctlr external sensor

So now, when the OCC becomes active and the hwmon driver is started, openpower-occ-control can

  1. Find all of the tempX_input/label/fru_type files in sysfs
  2. Use the value in the label file - the entity ID and instance, plus the value of the fru_type file to build a D-Bus sensor name: a. entity ID 66, instance 0 = dimm0, etc
    b. with the fru_type:

    1 = dimm0_intmb_temp 2 = dimm0_dram_temp 7 = dimm0_dram_extmb_temp. 3 = dimm0_pmic_temp 8 = dimm0_extmb_temp c. entity ID 135, instance N = procX_coreN_temp, where X is the processor chip number, known because there is a separate hwmon directory for every processor(occ) and the code already uses that to find the directory in the first place.

  3. Read all of the temps once per second, and put the results in the Value property a. /xyz/openbmc_project/sensors/temperature/dimm0_temp, etc

There are also power sensors that need to be handled the same way. Those can be identified purely by their fru_type. The Power Sensors section in the OCC spec contains the full list, there's too many to put here. Witherspoon/mihawk extra already have examples of the names to use.

Other notes:

When the OCC isn't active, the sysfs files will not be present. So, before the first boot, no sensors will be on D-Bus. After the system is up with sensors created and it powers off and sysfs disappears, the sensors should remain on D-Bus. The sensor value can be set to NaN. In addition, there is an xyz.openbmc_project.State.Decorator.Availability d-bus interface with a bool Available property. This property can then track when the sensor value is valid, so it would be false when powered off or OCC otherwise not active.

When there is a file I/O reading something in sysfs, that means the device driver returns an error. The error cases are:

  1. if an EAGAIN is encountered - Set the Available property on the Availability interface to false until it clears
  2. if a corresponding _fault file exists, and has a nonzero value in it, set the sensor value to NaN and set the Functional property on the OperationalStatus interface to false (otherwise OperationsStatus.Functional = true)
  3. Any other read errors, do same as 2.
lxwinspur commented 3 years ago

@spinler All implementations are to replace phosphor-hwmon, right?

Read all of the temps once per second, and put the results in the Value property a. /xyz/openbmc_project/sensors/temperature/dimm0_temp, etc

So, If the Dimm's fru_type value is 2, it is object path is the following: /xyz/openbmc_project/sensors/temperature/dimm0_dram_temp Right?

Also, Has this function been implemented on the hostboot side(use PLDM model)?

lxwinspur commented 3 years ago

For the service name, we need to use xyz.openbmc_project.Hwmon-1043099980.Hwmon1 instead of org.open_power.OCC.Control?

Sorry, Maybe I am wrong, please correct me if I am wrong, Thanks a lot :)

spinler commented 3 years ago

@lxwinspur Yep, this will replace phosphor-hwmon. You're correct about the dimm0_dram_temp name, and keeping the same org.open_power busname is fine.

Hostboot has this targeted for April, and only on Rainier and probably Witherspoon Tacoma. To test this code before they are done, or if you don't have one of those systems, I think you could either a) update the MRW to change the IPMI_IDs on those OCC sensors to be what you want and build hostboot with it, or b) just make a throwaway internal function that maps the IPMI IDs coming down to the new ID values and call it right after you read the label files.

lxwinspur commented 3 years ago

@spinler

About this issue, Can we start now?

spinler commented 3 years ago

@lxwinspur Sure. There is a commit in progress https://gerrit.openbmc-project.xyz/c/openbmc/openpower-occ-control/+/41176 that may be useful.

lxwinspur commented 3 years ago

@spinler @mzipse

@ChicagoDuan of the IPS Team has worked on it and started coding.

Also, @spinler Could you kindly explain why use openpower-occ-control for OCC temps and power? Because I remembered that we control for OCC temps and power with phosphor-hwmon before, compared with the previous function, what are the benefits? Maybe we missed something.

Thanks a lot :)

spinler commented 3 years ago

If you're asking why we're using these PLDM IDs instead of standard hardcoded phosphor-hwmon config file IDs, it's because we're now supporting 4 different systems in the same flash image. The largest one having something like 500 different sensor values that someone would need to type into both our config files and the MRW XML that hostboot would get the values from. And, the people typing them in would have to make sure they are the same across all the systems since the config files would all be at the same spot based on the FSI locations.

If you're asking why we want them in the OCC app, aside from it totally changing the behavior of phosphor-hwmon just for the OCC, we've seen failures both in the labs and the field of getting read failure error logs because phosphor-hwmon can't quite keep up with the OCC changing states. Moving it into the OCC app allows it to keep better in sync, because the app is already keeping an eye on the OCC state and can start/stop sensor monitoring as it sees the states change.

Additionally, we've gotten complaints from the field about how the OCC sensors all go off of d-bus when the system powers off, instead of just showing not available.

spinler commented 3 years ago

I don't know if it will be useful, but @smccarney is currently writing sensor monitoring code for the VRMs that has much of the same behavior as the OCC monitoring will. Not that code would be literally shared, but he might have some design ideas.

lxwinspur commented 3 years ago

@spinler

Thank you very much for providing us with a lot of useful information, that will be very helpful. Also, @ChicagoDuan is working on it, He will ping you if he has any questions.

Thanks again Matt

spinler commented 3 years ago

Now that the PLDM PDR design for cores is mostly defined, we know that:

So since we know from the fru_type anyway if the temp is for a core or a DIMM, we may need to rearrange the uint32 ID to instead have the DCM instance ID, parent processor instance ID, and core instance ID. This still needs to be run by hostboot.

lxwinspur commented 3 years ago

@spinler

The plan now is that hostboot will put the PLDM entity ID and entity instance numbers into the 4 byte OCC ID fields for the temperature sensors, like 0xAAAABBBB.

We want to know which pldm command we need to monitor? sensorEvent? https://github.com/openbmc/openpower-occ-control/blob/master/pldm.hpp#L66

spinler commented 3 years ago

You won't need monitor any PLDM commands. The label file will contain the value that hostboot told the OCC to use for that sensor. The number in there will contain what you need to know to name the sensor.

That being said, we just had a meeting with the hostboot team today, and they proposed a new numbering scheme. I need to make sure I have it right and then I'll let you know.

lxwinspur commented 3 years ago

@spinler Thanks for your mail. I am working on this task and have already started coding At present, I have a couple of questions that I need to confirm with you:

OCC Sensor Ids
- First byte is type
- Second byte is reserved
- Third+Fourth bytes are instance id

DIMM : D000iiii
- D0 = DIMM Sensor
- iiii = DIMM position relative to the backplane
       = Equal to PLDM entity id within PDR
           = Equal to Hostboot's ATTR_??  (TBD)

CORE : C000iiii
- C0 = Small Core DIMM Sensor
- iiii = Physical small core position relative to the parent processor chip
       = Equal to PLDM entity id within PDR
           = Equal to Hostboot's ATTR_CHIP_UNIT_POS
  1. Are all sensor IDs and entity IDs stored in the temp_label file or temp_input?
  2. If the value of the dimm2 sensor is equal to 47, should its value be like this? (D0020047)? Note that it is 47 instead of 47000?
  3. For temp_fru_type, my understanding is to distinguish Core & Dimm, but for Dimm type, where do you get it?
        > 1 = dimm0_intmb_temp
    > 2 = dimm0_dram_temp
    > 7 = dimm0_dram_extmb_temp。
    > 3 = dimm0_pmic_temp
    > 8 = dimm0_extmb_temp
  4. Where is a reference for all types of values? Because I don’t know which type is the value of temp_fru_type equal to 6?
spinler commented 3 years ago
  1. temp_input holds the actual temperature values. temp_label is the file with the IDs
  2. I think it would be D0000047. I'm not sure where the 2 came from?
  3. Those are documented in the OCC spec. I'll check on its availability. In the meantime, here's a snippet:

FRU Type - 1 byte. Indicates what type of FRU the temperature is for: 0x00: Processor (core temperature) 0x01: Internal Memory Controller 0x02: DIMM 0x03: Memory Controller + DIMM. Thermal sensor is located to cover both DRAM and MC. 0x04: GPU core 0x05: GPU memory 0x06: VRM Vdd 0x07: PMIC 0x08: Memory Controller external sensor

lxwinspur commented 3 years ago

@spinler

temp_input holds the actual temperature values. temp_label is the file with the IDs

We assume that the temperature value of dimm2 is 47000: tempX_input = 47000 tempX_lable = D002iiii

Note:

D0 = DIMM Sensor
02 = instance ID

Correct?

Because:

OCC Sensor Ids
- First byte is type      - D
- Second byte is reserved
- Third+Fourth bytes are instance id - 02

But I do not understand what the value of iiii is?

Also, for the tempX_fru_type:

FRU Type - 1 byte. Indicates what type of FRU the temperature is for:
0x00: Processor (core temperature)
0x01: Internal Memory Controller
0x02: DIMM
0x03: Memory Controller + DIMM. Thermal sensor is located to cover both DRAM and MC.
0x04: GPU core
0x05: GPU memory
0x06: VRM Vdd
0x07: PMIC
0x08: Memory Controller external sensor

I have some doubts here. We judge whether it is DImm or Core. It is judged by the first byte of tempX_label (D or C) So, if tempX_lable = D002iiii and tempX_fru_type = 0, what is the name of the D-Bus sensors?

Thanks a lot

spinler commented 3 years ago

I think I see the confusion because of something I didn't mention. This definition:

OCC Sensor Ids
- First byte is type
- Second byte is reserved
- Third+Fourth bytes are instance id

DIMM : D000iiii
- D0 = DIMM Sensor
- iiii = DIMM position relative to the backplane
       = Equal to PLDM entity id within PDR
           = Equal to Hostboot's ATTR_??  (TBD)

is for the uint32_t version of the ID, not a string. since that is what hostboot gives to the OCC, and what the OCC actually sends down in the poll response. The device driver then converts that into the string found in the label file.

So the uint32_t value is 0xD0000002, and the label would be "D00000002" (Though @eddiejames can confirm.)

spinler commented 3 years ago

For the other question, the reason the FRU type is embedded in the label is because the IDs are also what the OCC uses to callout FRUs for errors, so when hostboot gets those errors from the OCC it needs to be able to know the FRU type just from the ID since that is all it has, unlike for us where we also have the fru_type field of the poll response.

I think if the fru type embedded in the label field doesn't agree with the fru_type file, that probably implies that the label value is garbage and can't be trusted to build a sensor name out of. Maybe just trace that to the journal if it's a fru type we care about?

I don't think I mentioned this, but the OCC also sends down Vdd VRM temperatures in the poll response, and at this time those sensors do not need to be on d-bus, because the phosphor-regulators app is already handling that.

lxwinspur commented 3 years ago

@spinler @mzipse

So the uint32_t value is 0xD0000002, and the label would be "D00000002" (Though @eddiejames can confirm.)

About this issue, I have pushed a new commit to verify the value of the tempX_label file https://gerrit.openbmc-project.xyz/c/openbmc/openpower-occ-control/+/43734

Could you review it? Thanks a lot :)

ChicagoDuan commented 3 years ago

Get the temperature of processor and memory: ref: https://gerrit.openbmc-project.xyz/c/openbmc/openpower-occ-control/+/44256/4

spinler commented 3 years ago

@marthabroyles What all power sensor function IDs are valid in P10? All but the GPU ones?

marthabroyles commented 3 years ago

Correct all but the GPU ones are valid.

spinler commented 3 years ago

Regarding the power sensors, they can also be identified by the label files. The values are in microwatts. There are two types:

  1. Total power. It contains 'system' in the label file. It must be called 'total_power' because there is code looking for that.
    power15_input:883000000
    power15_label:system
  2. All the others. These label files have 3 numbers, of which we only care about the middle one: <sensor id>_<function id>_<apss channel>
    power5_input:6000000
    power5_label:0_10_5
Function ID Sensor Name
1 p0_mem_power
2 p1_mem_power
3 p2_mem_power
4 p3_mem_power
5 p0_power
6 p1_power
7 p2_power
8 p3_power
9 p0_cache_power
10 p1_cache_power
11 p2_cache_power
12 p3_cache_power
13 io_a_power
14 io_b_power
15 io_c_power
16 fans_a_power
17 fans_b_power
18 storage_a_power
19 storage_b_power
23 mem_cache_power
25 p0_mem_0_power
26 p0_mem_1_power
27 p0_mem_2_power

@marthabroyles @eddiejames Keep me honest.

spinler commented 3 years ago

I should add - the power values are only on one OCC, and not all systems will necessarily have all of them configured, so only a subset can come back in the poll response.

ChicagoDuan commented 3 years ago

I should add - the power values are only on one OCC, and not all systems will necessarily have all of them configured, so only a subset can come back in the poll response.

Thanks Matt. How can we determine which OCC will have power sensors? Or is it always in the OCC1?

spinler commented 3 years ago

@marthabroyles Can you answer that?

marthabroyles commented 3 years ago

It typically is the first OCC but does not need to be as this is hw dependent. The OCC connected to the APSS which is the chip that these power readings are read from is the OCC that will return the power readings in the poll response, this is the primary (aka controller) OCC. In the OCC poll response data the msb of the first byte of data indicates if this OCC is the "controller" if that bit is set then the power data will be present in that same poll response. I do not know how/if this is stored on the BMC. The BMC needs to know for other reasons which OCC is the "controller".

spinler commented 3 years ago

Thanks. @ChicagoDuan That OCC is referred to as the master in the code, and there is an API to check that.

ChicagoDuan commented 3 years ago

@ChicagoDuan That OCC is referred to as the master in the code, and there is an API to check that.

Thanks.

spinler commented 3 years ago

Hi @ChicagoDuan, we just had a meeting this morning where it was requested the code put the Vdd VRM temps on D-Bus after all. Could you add that? Up above, it is listed as a 0x06.

ChicagoDuan commented 3 years ago

Hi @ChicagoDuan, we just had a meeting this morning where it was requested the code put the Vdd VRM temps on D-Bus after all. Could you add that? Up above, it is listed as a 0x06.

Sure. I will add it.

ChicagoDuan commented 3 years ago

Hi @ChicagoDuan, we just had a meeting this morning where it was requested the code put the Vdd VRM temps on D-Bus after all. Could you add that? Up above, it is listed as a 0x06.

What is the sensorID of VDD VRM temps ?

cjcain commented 3 years ago

Sensor ID should not matter since there should only be a single Vdd VRM temp for an OCC

ChicagoDuan commented 3 years ago

Thanks. I've added Vdd VRM temp to the code.

lxwinspur commented 3 years ago

@spinler @mzipse Can we close this issue?

spinler commented 3 years ago

Merged.