hubblo-org / scaphandre

⚡ Energy consumption metrology agent. Let "scaph" dive and bring back the metrics that will help you make your systems and applications more sustainable !
Apache License 2.0
1.63k stars 109 forks source link

Detect and correct overflows of the RAPL microjoule counter #280

Open TheElectronWill opened 1 year ago

TheElectronWill commented 1 year ago

Problem

The RAPL energy counter is incremented and can overflow. Currently, this overflow is not handled.

Currently, the energy measurements are "slightly" (potentially a lot?) wrong. Fixing that might fix other issues where the user complain about "wrong" power usage.

Solution

Instead of ignoring the value, the overflow should be corrected. Quoting \@uggla:

If previous_microjoules !=0 then we could probably do microjoule = (u64::MAX - previous_microjoules) + last_microjoules

Alternatives

Additional context

See https://github.com/powercap/powercap/issues/3#issuecomment-636256230

TheElectronWill commented 1 year ago

The LIKWID tool does take the overflows into account. Some info here: https://github.com/RRZE-HPC/likwid/issues/13

connorimes commented 1 year ago

I saw this issue linked from the powercap project. FYI I think your proposed solution won't work correctly for two reasons:

  1. While the MSR is 64 bits, only 32 bits are used for energy values.
  2. Those 32 bits are encoded using status units from the MSR_RAPL_POWER_UNIT register. See Section 15.10 of the Intel Software Developer's Manual, Volume 3, March 2023 edition. The standard configuration encodes using the formulation 1/2^ESU, but some processors are different (particularly some Intel Atom CPUs) as are some domains like DRAM and PSYS on some processors which might have fixed ESU values for those domains that differ from the unit register.

I've found that detecting overflow in RAPL can be a challenge. At a minimum, you need to compute the actual max energy value that the MSR register can report and use that value when accounting for overflow, e.g., as done here [1] (full disclosure: my code). I'm not entirely convinced that this always works as expected though, even if you don't "miss" an overflow---I've seen quirky behavior in the past that resulted in overestimating power consumption. It could be that it's not really guaranteed that the register will achieve it's max logical value before it actually turns over, but this approach is at least logically correct modulo bad register behavior. I haven't conducted a rigorous experiment in a long time though, so I'm not sure how prevalent problems might be.

[1] https://github.com/energymon/energymon/blob/38ef1e6d2d69abf1e3496832369663918d9e56d4/msr/energymon-msr.c#L174-L182

Cheers.

TheElectronWill commented 1 year ago

That's right, 64 bits is too much for the MSR counter:

image

I haven't seen aberrant values when correcting the overflows just after reading the counter, I'll check that again :)

edit: of course using the MSR directly requires to take into account the "quirks" of some platforms, that's what the linux kernel does for perf and powercap (scaphandre uses powercap on linux, for now). These interface return 64bits values because they perform the unit conversion. I'll have to check the overflows in that case. Thanks for the info!