oxidecomputer / hubris

A lightweight, memory-protected, message-passing kernel for deeply embedded systems.
Mozilla Public License 2.0
3.04k stars 180 forks source link

BMR491 reports NaN for power #1581

Open mkeeter opened 11 months ago

mkeeter commented 11 months ago

While investigating https://github.com/oxidecomputer/hardware-gimlet/issues/1988 , we noticed that the BMR491 power sensor (112 0x70 power 4 F - 0x67 bmr491 V12_SYS_A2) is reading NaN. That seems weird!

cbiffle commented 8 months ago

I assume this is not obviously reproduceable. :\

Poking at the pmbus code now just to see if I find anything obvious.

cbiffle commented 8 months ago

If you recall, @mkeeter, which actual quantity was reading as NaN? The BMR491 has various voltage and current outputs, but does not have a power output per se (as in something measured in watts), so I want to make sure I'm looking at the right thing.

I realize it's been a minute.

cbiffle commented 8 months ago

Alright, caught up with Matt on this in chat.

I think this is a bug but not a Gimlet bug. Here's what it looks like is happening.

  1. app/gimlet/base.toml defines one power sensor channel on the BMR491.
  2. The BMR491 RON in pmbus defines zero power sensor channels.
  3. Accordingly, the sensors task allocates space for a power sensor whose readings are never delivered.
  4. The sensors task uses f32::NAN as an initialization value for the data_value array, and leaks that at the API boundary.
  5. For reasons I'm still investigating, humility sensors apparently doesn't work on a released image, so the engineers were using humility readvar.
  6. This exposed the sentinel value.

Fortunately this means it's a lower severity issue. I'd argue it's still a bug, or possibly three smaller bugs in a trenchcoat. The things this makes me want to go investigate are:

cbiffle commented 8 months ago

humility sensors, for the record, uses the hiffy generic Idol call interface. So it should work on release image with a dongle attached, and is not expected to work on a release image over IP. To get sensor data over the management network, we'd either need to use a control plane oriented service or add to the gimlet-inspector for debugging.

mkeeter commented 1 month ago

humility sensors, for the record, uses the hiffy generic Idol call interface. So it should work on release image with a dongle attached, and is not expected to work on a release image over IP. To get sensor data over the management network, we'd either need to use a control plane oriented service or add to the gimlet-inspector for debugging.

This specific problem was fixed with a net-friendly backend in https://github.com/oxidecomputer/humility/pull/491, but the baseline issue of sensors not existing / being polled remains!