sosy-lab / cpu-energy-meter

A tool for measuring energy consumption of Intel CPUs
BSD 3-Clause "New" or "Revised" License
321 stars 29 forks source link

Skylake server CPUs du not support PP1 domain #15

Closed PhilippWendler closed 6 years ago

PhilippWendler commented 7 years ago

All server CPUs do not support the PP1 domain, including Skylake, but for the latter the domain is currently enabled, resulting in a measurement that is always zero.

TBunk commented 6 years ago

We are currently not affected by this issue, as our code does not support the skylake server (as of yet). For the moment, we only support the Skylake signature types for the mobile- and deskop versions (model 78 and 94, respectively), whereas the server cpu uses the 85 as model number. I will include the skylake server cpu in a future update.

TBunk commented 6 years ago

Fixed in commit https://github.com/sosy-lab/cpu-energy-meter/commit/e43c19d38867c2a01de2787b27774296e085d230. The available domains are now automatically detected during runtime, by directly checking the processor as to whether the domains are actually supported.

PhilippWendler commented 6 years ago

When I run the tool on older CPUs, I still get all domains in the output, some of them with value 0:

+--------------------------------------+
| CPU-Energy-Meter            Socket 0 |
+--------------------------------------+
Duration                  2.161589 sec
Package                   2.098450 Joule
Core                      0.372070 Joule
Uncore                    0.000000 Joule
DRAM                      3.191040 Joule
PSYS                      0.000000 Joule

(Intel(R) Xeon(R) CPU E3-1230 v5 @ 3.40GHz)

Either the output or the detection logic seems wrong?

TBunk commented 6 years ago

The detection logic is working as intended, however, my understanding for how the registers are read seemed to be the fault here. I (apparently mistakenly) assumed that the fread-method returns an error (i.e., value 0) when the cpu does not support a specific msr, whereas in practice the msr still exists and contains an arbitrary value that does not change over time.

My solution to this is to just hide the output if the specific msr does not change its value over the course of time. Does this seem reasonable to you, @PhilippWendler ?

Edit: On a side-note, if one of the specific msr's (i.e., package, dram, core, uncore, psys) does in fact not exist, and an error is returned when trying to retrieve the value for the first time, the implementation then works as intended and disables the msr for the rest of its execution.

PhilippWendler commented 6 years ago

Is there no other way to detect which domains are supported, e.g., via a flag or a feature bit somewhere? How did the original Intel code handle this detection of domains?

TBunk commented 6 years ago

The previous implementation of the domain detection can be found here. To me, that seemed confusing at best, and inefficient and error prone at worst - that is why I have changed that in the first place.

I have not looked yet for a flag or feature bit, though, I think I would have noticed that in the official manuals. I will have a look at it again nevertheless.

PhilippWendler commented 6 years ago

But we should not read arbitrary MSRs for CPUs which are undefined, it is probably not even guaranteed that the value in the MSR stays constant and we could present wrong values.

TBunk commented 6 years ago

In commit 14ae54d8c36743b94f2c52e065898b62bbe6ddb6 I've changed the code such that the measuring of a msr is disabled if the respective value isn't changed within a time span of 100 milliseconds.

Can you check it out and tell me if that works for you, @PhilippWendler ?

PhilippWendler commented 6 years ago

Regardless of whether it works, this is a problem. If we do as implemented, we wait a specific time for each domain. This time is thus lost from the measurements, and can be up to 0.5s currently, which is too much. The time between starting the tool and measuring for the first time needs to be as short as possible. If we need to rely on whether the value stays constant across a period of time, then we can just as well always measure all domains and omit values that are 0 at the end, there is no difference in effect to what we have now.

TBunk commented 6 years ago

Regardless of whether it works, this is a problem. If we do as implemented, we wait a specific time for each domain. This time is thus lost from the measurements, and can be up to 0.5s currently, which is too much.

The functionality was implemented with threads (five in this case; one for each domain), so the additional time needed for doing the initialization is at most 0,1sec. The latter value of 100ms was admittedly just a placeholder value, because I run short of time yesterday. I've now looked up the time at which the values are updated in their registers, and in the official manual (Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3B, sect. 14.9.3 - 14.9.5, page 14-33ff.) it is stated that the respective MSRs are updated every ~1msec. The additional waiting time in the implementation could therefore be safely reduced to something like 10ms.

For the record, I've just measured the time for how long the program takes for its initialization in the previous commit (i.e., from the beginning of the main-method until the method-call do_print_energy_info(), which contains the main-loop for periodically scanning the msr values). For 10 executions of the program, the results are as follows (units in seconds):

1.) prep_duration_seconds=0.012963
2.) prep_duration_seconds=0.016724
3.) prep_duration_seconds=0.017750
4.) prep_duration_seconds=0.015458
5.) prep_duration_seconds=0.022603
6.) prep_duration_seconds=0.024122
7.) prep_duration_seconds=0.023337
8.) prep_duration_seconds=0.019622
9.) prep_duration_seconds=0.014068
10.) prep_duration_seconds=0.013913

The mean time for the initialization of the program seems to be around 20ms on my working machine. I agree that even an additional time of 10ms for the initialization is quite huge nevertheless.

TBunk commented 6 years ago

If we need to rely on whether the value stays constant across a period of time, then we can just as well always measure all domains and omit values that are 0 at the end, there is no difference in effect to what we have now.

As for the approach, I thought we agreed on looking up the values that don't change over the course of time, and simply disable them afterwards.

I would have done that after the first time the energy consumption is measured, as this had meant that no additional time for the initialization was required. However, you told me that you rather had the probing for msr values disabled even before the first measurement of the consumed energy in the MSRs are made, so that's how I came up with my current implementation.

I'll next rewrite the code for this issue with my idea in mind, so that no additional time will be needed for the initialization of the program. Maybe we've talked past each other on the approach, however, If you have any further suggestions, please let me know.

Edit: fixing typos

PhilippWendler commented 6 years ago

The best would be to determine which domains are supported without ever reading from the respective MSRs. If we cannot do this, we need to fall back to reading them and discarding values that do not change. However, if we use the latter strategy, it doesn't matter if we read the MSRs twice or more often.

So what is the benefit in doing an extra read just to determine whether to disable a domain? As far as I see, we can simply attempt to measure all domains and in the end discard all zero values, this will have exactly the same effect.

TBunk commented 6 years ago

Unfortunately, there is no way to determine the supported RAPL-domains beforehand. Moreover, the model-types of the Intel cpu's can neither be used to manually dictate which domains should actually be measured. For example, we've checked the 'Skylake desktop'-cpu (model 0x506E0), which is used in both server- and client-cpu's, and found out that some of them support the pp1- and psys-domain, whereas others of the same model do not.

This has led to the current implementation, where we simply read the respective MSRs (that weren't disabled before due to an error, for example) and omit the null-values in the final output.