travisdowns / uarch-bench

A benchmark for low-level CPU micro-architectural features
MIT License
679 stars 59 forks source link

Implement "delta" measurement #10

Open travisdowns opened 7 years ago

travisdowns commented 7 years ago

Currently we just measure the absolute time of the code under test like so:

static int64_t time_method(size_t loop_count) {
    auto t0 = CLOCK::now();
    METHOD(loop_count);
    auto t1 = CLOCK::now();
    return t1 - t0;
}

The downside of this approach is that it includes the time for one CLOCK::now() call as well as all the overhead of METHOD(loop_count) which includes at least a call and ret and sometimes a small amount of setup overhead.

A better approach is to time the loop with two different loop_count and use the difference in time to calculate the performance. This causes the above overheads to cancel out (but the test/jump overhead inside the loop within the benchmark is still present, but this is small or sometimes zero).

travisdowns commented 6 years ago

Note that two-method delta measurement has been implemented in 7667eacd333d4dcefaa2530f4e1228227c01dfef. This shows the results as a delta between the benchmark method and a "base" method that defaults to the empty benchmark dummy_bench.

Leaving this open since should still implement loop-count based deltas as described above: this uses the same method twice, but with different loop counts. This has some advantages and also some disadvantages over the base method - the primary one being that the loop based method probably does a better job of getting rid of the per-benchmark overhead like setup, which the dummy_bench wouldn't do unless you wrote a specific dummy for each test.