sosy-lab / cpu-energy-meter

A tool for measuring energy consumption of Intel CPUs
BSD 3-Clause "New" or "Revised" License
321 stars 29 forks source link

Fix DRAM measurement unit #14

Closed PhilippWendler closed 6 years ago

PhilippWendler commented 7 years ago

According to the Intel Xeon Processor E5-1600 and E5-2600 v3 Product Families Volume 2 of 2, Registers Datasheet, Chapter 5.3.2, the unit of the DRAM measurements is always 15.3µJ. This document applies to Xeon v3 CPUs (Haswell), but the Linux kernel also uses this unit for newer server CPUs including Skylake (intel-rapl.c, rapl.c).

For desktop CPUs, however, the normal unit is used.

TBunk commented 6 years ago

Fixed in commit https://github.com/sosy-lab/cpu-energy-meter/commit/25442af77a9015d2ffbc652f18d7381980f49f86

PhilippWendler commented 6 years ago

I compared measurements with 8086283368a5886cb642ca8e6294f6ae8f16c748 (2017-01-06) against 949566cbf12ca0be2d7bcb9867a93e26c48a9685 (2018-01-15). On our Xeon CPU E3-1230 v5 (Skylake-DT) the DRAM measurements with these two versions were pretty much the same, although I would have expected a difference due to these changes.

Why do we get the same result?

TBunk commented 6 years ago

Unfortunately, I currently cannot fix this as I do not have an access to such a CPU. The error is most likely due to the code not triggering in any of the cases in rapl.c#rapl_dram_energy_units_probe(double) (in particular, case CPU_INTEL_SKYLAKE_X).

Could you log yourself in on the machine and type lscpu on the console? This would be really helpful to me. In particular, the values for CPU family, model, and by extension Vendor ID and Model name are the important ones.

PhilippWendler commented 6 years ago
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    2
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 94
Model name:            Intel(R) Xeon(R) CPU E3-1230 v5 @ 3.40GHz
Stepping:              3
CPU MHz:               899.937
CPU max MHz:           3800.0000
CPU min MHz:           800.0000
BogoMIPS:              6816.61
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              8192K
NUMA node0 CPU(s):     0-7
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb invpcid_single intel_pt kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp

Note that I am not saying that the measurement on this CPU is definitely wrong, I am just saying that I would have expected the measurement to change due to this issue, and it did not. Either the measurement or the expectation is wrong.

TBunk commented 6 years ago

I just had a look at the official documentation, and according to it the actual measurement is indeed wrong. The Skylake cpu's are defined in intel-family.has follows:

#define CPU_INTEL_SKYLAKE_MOBILE        0x406E0     // Family 6 Model 78 (0x4e)
#define CPU_INTEL_SKYLAKE_DESKTOP       0x506E0     // Family 6 Model 94 (0x5e)
#define CPU_INTEL_SKYLAKE_X                 0x50650     // Family 6 Model 85 (0x55)

Note that the Xeon-CPUs (X) are commonly branded as Server-CPUs. I have double-checked that these values are indeed correct, however, I have deliberately excluded the first two from the above list for being measured with 15.3µJ as energy unit. ~This was the mistake here, because although your cpu is of model 94 (described with skylake_desktop above), it still needs to be measured with 15.3µJ as energy unit. This was not so clear by just using the above link as reference.~

Edit: Actually, I've looked at the wrong rows, and there is in fact nothing written about model 94 needing to be measured with 15.3µJ as energy unit.

PhilippWendler commented 6 years ago

Why to you think that the 15.3µJ apply to all Skylake CPUs and not only to SKYLAKE_X?

According to the PDF you linked, the 15.3µJ apply to

TBunk commented 6 years ago

For reference, the official documents for the Intel Xeon processor E3, E5, and E7 families can be found at https://www.intel.com/content/www/us/en/processors/xeon/xeon-technical-resources.html

In the above link, the documents for your specific Xeon processor can be found under the section Intel® Xeon® Processor E3 Family (more specially, below the row Intel® Xeon® processor E3-1200 v5 product family). The datasheet consists of two parts, namely volume 1 and volume 2.

Unfortunately, though, neither in the linked volumes above, nor in any of the other volumes linked on the above URL regarding the Xeon E3 Family was anything written about which unit should be used in order to measure the DRAM power info.

PhilippWendler commented 6 years ago

For the record, on the above CPU cpu-energy-meter now prints

[DEBUG] GenuineIntel processor found.
[DEBUG] Processor is from family 6 and uses model 0x506E0.
[DEBUG] Using the default unit for measuring the rapl DRAM values: 6.103516e-05
[DEBUG] Measured the following unit multipliers:
[DEBUG] RAPL_ENERGY_UNIT: 6.103516e-05 J
[DEBUG] RAPL_DRAM_ENERGY_UNIT: 6.103516e-05 J
[DEBUG] Interval time of msr probes set to 1637s, 399999618ns:

We agree that this is correct according to the Intel Software Developer's Manual.