[FR] Lightweight way to pause/resume timer in benchmarks

kgorking commented 3 years ago

Using PauseTiming/ResumeTiming can make some benchmarks take a very long time to complete. Given the following example, if the call to A() takes longer than B(), then the runtime of the benchmarks can increase to ludicrous lengths.

void A();
void B();

void example(benchmark::State &state) {
    for (auto const _ : state) {
        state.PauseTiming();
        A();
        state.ResumeTiming();

        B();
    }
}
BENCHMARK(example);

To fix this, I have implemented a pair of functions (BeginIgnoreTiming/EndIgnoreTiming) that can be used to ignore the timing cost of code executed between them. It keeps timing info around for when CreateRunReport() is called, where the duration ignored is subtracted from the cpu- and real-time timings, so it only affects the printed timing values.

This allows benchmarks to run as if there were no calls to pause the timings, while getting the timings of the benchmark I am actually interested in (B() in the above example).

I have included the results of some runs from a personal project of mine. The benchmark is of a similar format to the A/B example from above.

With no pause/resume. Results are not useable.
---------------------------------------------------------------------------------
Benchmark                                       Time             CPU   Iterations
---------------------------------------------------------------------------------
ranged_system_with_components/4096           1.04 us         1.05 us       640000
ranged_system_with_components/32768          2.92 us         2.95 us       248889
ranged_system_with_components/262144         18.0 us         18.0 us        37333
ranged_system_with_components/2097152         298 us          298 us         2358
ranged_system_with_components/16777216       2372 us         2352 us          299
* Benchmark took 4716 milliseconds

With PauseTiming/resumeTiming
---------------------------------------------------------------------------------
Benchmark                                       Time             CPU   Iterations
---------------------------------------------------------------------------------
ranged_system_with_components/4096          0.574 us        0.688 us      1294865
ranged_system_with_components/32768         0.616 us        0.670 us      1120000
ranged_system_with_components/262144        0.831 us        0.625 us      1000000
ranged_system_with_components/2097152       0.630 us        0.900 us       746667
ranged_system_with_components/16777216      0.810 us         1.34 us       793297
* Benchmark took 3366572 milliseconds (56 minutes)

With new lightweight pause/resume
---------------------------------------------------------------------------------
Benchmark                                       Time             CPU   Iterations
---------------------------------------------------------------------------------
ranged_system_with_components/4096          0.579 us        0.534 us       497778
ranged_system_with_components/32768         0.614 us        0.659 us       213333
ranged_system_with_components/262144        0.819 us         1.26 us        37333
ranged_system_with_components/2097152       0.601 us        0.000 us         2358
ranged_system_with_components/16777216      0.749 us        0.000 us          299
* Benchmark took 4819 milliseconds

LebedevRI commented 3 years ago

So how does it work?

kgorking commented 3 years ago

It works just like the normal pause/resume functions, but the timings recorded are subtracted from the normal timings when the report is generated. You can see the diff here: https://github.com/google/benchmark/compare/master...kgorking:lightweight_timer_suspension

LebedevRI commented 3 years ago

I see, so no magic.The effect is really surprising to me. By any chance, do you have an understanding where the PauseTiming/ResumeTiming manages to waste so much time? Just by looking at that diff of yours, i would have expected it to have as bad of an impact.

kgorking commented 3 years ago

The pause/resume functions modify the values of the timer used to calculate the number of iterations that should be run. In my example from above, the benchmarking code thinks that an iteration takes 0.810us when it is in reality taking 2372us, and then calculates that 793297 iterations should be run to hit a certain time threshold.

The benchmark runner thinks it will take 0.810us * 793297 = ~0.64 seconds, when it actually takes 2372us * 793297 = ~31.4 minutes.

LebedevRI commented 3 years ago

Hm, so with BeginIgnoreTiming/EndIgnoreTiming, if the time spent with paused timing increases, the iteration count will decrease?

kgorking commented 3 years ago

The iteration count will be the same as if there was no pause/resume

LebedevRI commented 3 years ago

I see. Let me explain my take on this.

Let's suppose we have a benchmark:

a;

/*
// pause
b;
// unpause
c;
*/

We want to measure how much time is spent in a. What we roughly do, is repeat that snippet N times (where N is `wanted total runtime / (time it took to do Y iterations / Y iterations)), and then divide the total measured time by the iteration count. Thus, we got a not-so-noisy measurement. Pretty easy.

Now, let's suppose we have a benchmark:

a;
// pause
b;
// unpause
c;

We want to measure how much time is spent in a and c, but not b. Let's pretend each one of a, b and c take the same amount of time, If we use existing PauseTiming/ResumeTiming, we'll do N/2 iterations, and if we don't do an pausing of the measurements, we end up doing N/3 iterations.

I think it is really important to notice that the less iterations we do, the bigger stdev the measurement will have. And this will be especially important if the ratio between the paused and unpaused sections is different by many times like in your case i guess.

So i, personally, find this approach to be fundamentally wrong. But perhaps i'm just wrong.

dmah42 commented 3 years ago

I don't think it's fundamentally wrong, and it is in fact how we used to do the pause/resume when the library was first written. However, your (@LebedevRI) point stands, which is that we want to take into account the pause when calculating iterations to ensure that the bits we are measuring get enough coverage.

Now ideally we wouldn't use the iteration growth stuff and instead would run while monitoring stddev and run until stddev < magic_number || max_iterations. if we did that, then i think a lighter weight pause/resume would make sense. But i'd also guess that it wouldn't have any impact on the end-to-end runtime.

So i'm happy that you've found a way to make your benchmarks run for less time, but i don't think your results are as statistically useful as they could be.

https://github.com/google/benchmark/issues/1051 is a related bit of work by which we can reduce the variance of benchmarks and balance execution cost.

LebedevRI commented 3 years ago

Now ideally we wouldn't use the iteration growth stuff and instead would run while monitoring stddev and run until stddev < magic_number || max_iterations. if we did that, then i think a lighter weight pause/resume would make sense.

I'm guessing what we'd want to monitor is relative standard deviation, and i would guess around 1% would be the threshold to target. But, how would that work, practically speaking? Would that bring us back to https://github.com/google/benchmark/pull/710 ?

dmah42 commented 3 years ago

Practically, i think it would have to, yes. it's one of the many reasons why i haven't pursued it.

LebedevRI commented 3 years ago

Since we now clearly have a second potential use-case, should i try to resurrect that patch?

google / benchmark

[FR] Lightweight way to pause/resume timer in benchmarks #1087