tukl-msd / DRAMPower

Fast and accurate DRAM power and energy estimation tool
Other
122 stars 47 forks source link

Review clock cycle counters when exiting self-refresh. #40

Closed efzulian closed 6 months ago

efzulian commented 8 years ago

The idea here is to review the counters of cock cycles in self-refresh power-up mode for all possible self-refresh exit contexts.

Both scenarios with and without DLL should be considered.

Other counters like "latest_pre_cycle" (which are assigned in the same context) should also be considered in the analysis.

Some code:

if (memSpec.memArchSpec.dll == false) {
   spup_cycles     += memSpec.memTimingSpec.XS - spup_pre;
   latest_pre_cycle = max(timestamp, timestamp +
                        memSpec.memTimingSpec.XS - spup_pre -
                        memSpec.memTimingSpec.RP);
} else {
  spup_cycles     += memSpec.memTimingSpec.XSDLL -
                     memSpec.memTimingSpec.RCD - spup_pre;
  latest_pre_cycle = max(timestamp, timestamp +
                     memSpec.memTimingSpec.XSDLL - memSpec.memTimingSpec.RCD -
                     spup_pre - memSpec.memTimingSpec.RP);
}

Additionally, there are useful comments in the pull-request #39.

It is also important to evaluate the relevance of these counters (spup_cycles and latest_pre_cycle). It seems that they are used to calculate some energy components which do not make part of the total trace / pattern energy (total_energy). E.g.: latest_pre_cycle --> idle_pre_update() --> idlecycles_pre --> energy.idle_energy_pre --> cout

The removal of irrelevant code chunks / variables (if any) would be beneficial to the code maintainability.

Sv3n commented 7 years ago

I've had some time to think about this issue, and the bigger picture related to DRAMPower as a whole, so this is going to be a long post, that we can hopefully partially reuse as part of the readme in the future.

It is about time we properly spec what DRAMPower actually calculates, and trim where possible. This touches issues #26, #40 and #41.

DRAMPower has 30+ energy-related outputs (listed in MemoryPowerModel::power_calc), 16 cycle counters (listed in CommandAnalysis::clearStats), and a handful of timestamps that are tracked along the way. This seems excessive.

The last 9 of the energy outputs relate to IO / termination. Let's ignore those for now to limit the scope of what we are doing here.

CommandAnalysis (the trace parsing phase)

Commands consume a certain amount of active energy to execute. Executing a command can change the power-state of the memory. In each power state, a specific background power is consumed.

Goals:

  1. Count the number of executed commands of each kind (ACT, PRE, RD, WR, REF, the power-down family, SREF).
  2. Count the number of cycles spent in distinct power states.

Now, calculating the total energy usage is conceptually simple:

  1. Multiply the number of executed command of a certain type with its respective active energy cost, and sum everything to get the total active energy.
  2. For each power state, multiply the number of cycles spent in it by the power of that state, and sum everything to get the total background energy.
  3. Active + background is total energy.

_Action points_ I would like to see the structure of this procedure reflected in the code, i.e.:

In CommandAnalysis.cc and friends:

In MemoryPowerModel.cc and friends:

Power states

There are 5 (non-overlapping) power states to distinguish. In each of these states, a specific background current is consumed:

  1. Precharged: all banks are closed, and the memory is not in one of the power-down states. The memory consumes IDD2N here.
  2. Active: at least one bank is open, while not in a power-down state. The memory consumes IDD3N here.
  3. Precharged power-down: after PDE command is given while we were precharged. If fast-exit mode: IDD2P1, if slow-exit mode: IDD2P0.
  4. Active power-down: after PDE command is given while we were active. The memory consumes IDD3P.
  5. Self-refresh: IDD6.

If we see fast and slow-exit as two distinct cases, then we should need only 6 cycle counters. After we implement #26, we can go back to 5. I've referred to this image before:

Power state machine (https://imgur.com/a/txM7o)

Mapping to existing code

(I'm looking at MemoryPowerModel.cc, line 150+ here):

These 5 energy values we calculate seem to (approximately) correspond to the active power of individual commands: (act_energy, pre_energy, read_energy, write_energy, ref_energy). Working backwards from the currents, the mapping to background currents per power state is probably:

  1. Precharged: precycles --> energy.pre_stdby_energy
  2. Active: actcycles --> energy.act_stdby_energy
  3. Precharged power-down: f_pre_pdcycles + s_pre_pdcycles --> energy.*_pre_pd_energy
  4. Active power-down: `f_act_pdcycles + s_act_pdcycles --> energy.*_act_pd_energy``. Note: I don't think that the distinction between fast and slow active power down modes is a real thing.
  5. Self-refresh: sref_cycles_idd6?? I am not completely sure what is happening in engy_sref():
((idd6 * sref_cycles_idd6) + ((idd5 - idd3n) * (sref_ref_act_cycles
                                                           + spup_ref_act_cycles + sref_ref_pre_cycles + spup_ref_pre_cycles)))
                * vdd * clk;

(idd6 * sref_cycles_idd6) is a background energy component, while the second term is an active component. So I think that self-refreshes are not counted in ref_energy.

What is next?

I see two options:

  1. Keep on renovating the current code base. Restructure it, fix bugs, and define what all our outputs are supposed to mean.
  2. Come up with a list of output we would like to generate, and instrument the DRAMPower to create exactly those. Compare to existing outputs, see how different they are. Review new implementations, and remove the old code when we are happy with the new stuff.

My person preference is option 2: I think we currently have a complex code base, that calculates more outputs than most of our users are interested in, and interpreting what they mean is much harder than it needs to be. Maintenance is not fun. We can do better, so I think an overhaul is required. Even though we have released a few versions with useful upgrades, a new release with a real overhaul of the core code would make this project much more viable I think.

I'll stop here for now, since I think it is good to give an opportunity for feedback first. If this is the direction we are moving into, then I would be happy to set up the class-infrastructure I have in mind.

efzulian commented 7 years ago

I believe that engy_sref() is intended to return the self-refresh active energy without the background energy.

Some text extracted from Karthik Chandrasekar's PhD thesis, 2.5.4 Self-Refresh Mode Transition:

The IDD6 current is consumed for the time period spent in the self-refresh mode as defined in the trace (nSR), which excludes the time spent in finishing the explicit auto-refresh (as depicted in Figure 2.13). The auto-refresh consumes IDD5 − IDD3N over one refresh period (nRFC) from the start of the self-refresh. IDD2N current is consumed when exiting the self-refresh state for the nXSDLL exit period.

Briefly, the time spent in self-refresh is broken in three parts:

  1. explicit auto-refresh (the duration is tRFC, the current is IDD5 − IDD3N)
  2. self-refresh mode (the duration is determined on exit, the current is IDD6)
  3. self-refresh power up (the duration depends on the presence of DLL since it may need a new lock, the current is IDD2N)

Furthermore, he (Karthik in his thesis) explores different situations regarding the arrival time of the self-refresh exit command (all scenarios are described in the issue #39).

I think you are right: self-refreshes are not counted in ref_energy. There is sref_energy for that.

efzulian commented 7 years ago

About the changes, my vote goes to option 2. Of course I volunteer to help.

In my opinion, the first thing to do would be a clean-up removing everything that is unnecessary reducing the chance that people extend it. I see that you are doing lots of improvements it in the refact_ca branch.

Sv3n commented 7 years ago

Note to self: ignore the "SRE" edge, and the little state-bubble it flows into in the state diagram (its a simplification at best, the figure can be improved). The general rule that DRAMPower follows when it comes to modeling SREF is:

On SREN:

On SRX:

We always end up in an IDD2N state.