Open kgorking opened 3 years ago
So how does it work?
It works just like the normal pause/resume functions, but the timings recorded are subtracted from the normal timings when the report is generated. You can see the diff here: https://github.com/google/benchmark/compare/master...kgorking:lightweight_timer_suspension
I see, so no magic.The effect is really surprising to me. By any chance, do you have an understanding where the PauseTiming/ResumeTiming manages to waste so much time? Just by looking at that diff of yours, i would have expected it to have as bad of an impact.
The pause/resume functions modify the values of the timer used to calculate the number of iterations that should be run. In my example from above, the benchmarking code thinks that an iteration takes 0.810us when it is in reality taking 2372us, and then calculates that 793297 iterations should be run to hit a certain time threshold.
The benchmark runner thinks it will take 0.810us * 793297 = ~0.64 seconds
, when it actually takes 2372us * 793297 = ~31.4 minutes
.
Hm, so with BeginIgnoreTiming/EndIgnoreTiming, if the time spent with paused timing increases, the iteration count will decrease?
The iteration count will be the same as if there was no pause/resume
I see. Let me explain my take on this.
Let's suppose we have a benchmark:
a;
/*
// pause
b;
// unpause
c;
*/
We want to measure how much time is spent in a
.
What we roughly do, is repeat that snippet N times
(where N is `wanted total runtime / (time it took to do Y iterations / Y iterations)),
and then divide the total measured time by the iteration count.
Thus, we got a not-so-noisy measurement.
Pretty easy.
Now, let's suppose we have a benchmark:
a;
// pause
b;
// unpause
c;
We want to measure how much time is spent in a
and c
, but not b
.
Let's pretend each one of a
, b
and c
take the same amount of time,
If we use existing PauseTiming/ResumeTiming, we'll do N/2 iterations,
and if we don't do an pausing of the measurements, we end up doing N/3 iterations.
I think it is really important to notice that the less iterations we do, the bigger stdev the measurement will have. And this will be especially important if the ratio between the paused and unpaused sections is different by many times like in your case i guess.
So i, personally, find this approach to be fundamentally wrong. But perhaps i'm just wrong.
I don't think it's fundamentally wrong, and it is in fact how we used to do the pause/resume when the library was first written. However, your (@LebedevRI) point stands, which is that we want to take into account the pause when calculating iterations to ensure that the bits we are measuring get enough coverage.
Now ideally we wouldn't use the iteration growth stuff and instead would run while monitoring stddev and run until stddev < magic_number || max_iterations
. if we did that, then i think a lighter weight pause/resume would make sense. But i'd also guess that it wouldn't have any impact on the end-to-end runtime.
So i'm happy that you've found a way to make your benchmarks run for less time, but i don't think your results are as statistically useful as they could be.
https://github.com/google/benchmark/issues/1051 is a related bit of work by which we can reduce the variance of benchmarks and balance execution cost.
Now ideally we wouldn't use the iteration growth stuff and instead would run while monitoring stddev and run until
stddev < magic_number || max_iterations
. if we did that, then i think a lighter weight pause/resume would make sense.
I'm guessing what we'd want to monitor is relative standard deviation, and i would guess around 1% would be the threshold to target. But, how would that work, practically speaking? Would that bring us back to https://github.com/google/benchmark/pull/710 ?
Practically, i think it would have to, yes. it's one of the many reasons why i haven't pursued it.
Since we now clearly have a second potential use-case, should i try to resurrect that patch?
Using PauseTiming/ResumeTiming can make some benchmarks take a very long time to complete. Given the following example, if the call to A() takes longer than B(), then the runtime of the benchmarks can increase to ludicrous lengths.
To fix this, I have implemented a pair of functions (BeginIgnoreTiming/EndIgnoreTiming) that can be used to ignore the timing cost of code executed between them. It keeps timing info around for when CreateRunReport() is called, where the duration ignored is subtracted from the cpu- and real-time timings, so it only affects the printed timing values.
This allows benchmarks to run as if there were no calls to pause the timings, while getting the timings of the benchmark I am actually interested in (B() in the above example).
I have included the results of some runs from a personal project of mine. The benchmark is of a similar format to the A/B example from above.