Closed thomthom closed 5 months ago
The first repetitions picks the iteration count, the other repetitions just keep using the same iteration count.
I also assumed that mean/median etc would report a value of the mean/media of all repetition not the number of repetitions.
It is important to note that within a single repetition, we do not record the individual times each iteration took, only the overall accumulated total time all iterations of an repetition took.
So the aggregate statistics (over repetitions) are correctly saying that they were calculated from (10, in your case) "repetition count", not total number of iterations over all repetitions.
So everything is working as intended.
I realize I've missed some key fundamentals of this tool. Thank you for the insight and your time to clarify how it works.
I was doing benchmarks while working on performance improvements. I'd set my benchmarks to have a min run time of 2 seconds which seemed fine at the time. Only later when I added parallelization to my code did the results between runs to be more varied. I guess that's because as the code was using more CPU on multiple CPUs I started seeing more noise from the rest of the system.
That lead me to try out repetitions. But all along I was focusing on the iteration count.
Do I understand it correctly that the iterations are not the best to compare between run, or between builds. That instead I should focus on the median run-time of the benchmark itself? And use that as a base to compare performance improvement? (Seeing how iterations get locked to the first run, that metric seem like an unstable one to compare.)
(What you generally want to look at, is the two Time columns.)
Unless you specifically know that you need something different,
i'd recommend to take a look at the https://github.com/google/benchmark/blob/b7ad5e04972d3bd64ca6a79c931c88848b33e588/docs/tools.md,
and try to use benchmark/tools/compare.py
script.
GoogleBenchmark 1.8.3
I'm working with some benchmarks where I'm seeing a good amount of difference between each run. So I tried to use the
--benchmark_repetitions
to eliminate noise. But the data I'm seeing is confusing me:Every repetition is reported with the exact same number of iterations, which is highly suspicious. I never get the exact same value in individual runs:
System Which OS, compiler, and compiler version are you using:
Expected behavior Maybe I misunderstood how to use this flag or how to read the data. But what I expected what to see that each repetition reported different values that was similar to when I ran the benchmark manually multiple times.
I also assumed that mean/median etc would report a value of the mean/media of all repetition not the number of repetitions.