open-power / serverwiz

Apache License 2.0
4 stars 18 forks source link

ServerWiz allows duplicate IPMI_SENSOR_IDs #29

Closed ghost closed 7 years ago

ghost commented 7 years ago

We've had at least two places where XML has had duplicate sensor IDs.

OPAL will warn on this on boot, with an entry in the log (which will soon be picked up by FWTS as a firmware bug):

[   84.176082494,3] DT: dt_attach_root failed, duplicate sensor@9

This is caused by having XML that has the duplicate. e.g.:

https://github.com/supermicro/p8dtu-xml/blob/P8DTU_release/p8dtu.xml#L1657

<globalSetting>
    <id>/sys-0/boot_count_sensor</id>
    <property>
    <id>IPMI_SENSOR_ID</id>
    <value>0x09</value>
    </property>
</globalSetting>

https://github.com/supermicro/p8dtu-xml/blob/P8DTU_release/p8dtu.xml#L5357

<globalSetting>
    <id>/sys-0/node-0/motherboard-0/proc_socket-1/module-1/proc/occ-0/occ_active_sensor</id>
    <property>
    <id>INSTANCE_ID</id>
    <value>occ_active_sensor</value>
    </property>
    <property>
    <id>IPMI_SENSOR_ID</id>
    <value>0x09</value>
    </property>
</globalSetting>

NOTE: identical sensor IDs 0x09

This is something that could be (hopefully) easily enforced in ServerWiz so that these kind of bugs could be caught much much much earlier.

ghost commented 7 years ago

See also https://github.com/open-power/op-build/issues/751

Cc @pridhiviraj

nkskjames commented 7 years ago

Serverwiz is just a dumb topology editing tool. The ipmi sensor id is just another one of the gazillion attributes that it allows the user to enter. It doesn't know the rules associated with the given topology. Also different consumers have different rules (BMC, FSP, HB). So currently the strategy is the consumer processing tool checks the rules. Pros are serverwiz can stay dumb and not maintain a huge set of potentially complicated rules. Cons are the feedback is a long loop to know something is broken. So if we stay in the current methodology, processMrw.pl in hostboot should be checking that.

ghost commented 7 years ago

@nkskjames makes sense, thanks.

I've opened https://github.com/open-power/hostboot/issues/75 to track it.