google / benchmark

A microbenchmark support library
Apache License 2.0
8.59k stars 1.57k forks source link

[BUG] Reported iterations per run when using repetitions always the same value? #1752

Closed thomthom closed 5 months ago

thomthom commented 5 months ago

GoogleBenchmark 1.8.3

I'm working with some benchmarks where I'm seeing a good amount of difference between each run. So I tried to use the --benchmark_repetitions to eliminate noise. But the data I'm seeing is confusing me:

>traceup_benchmarks.exe --benchmark_filter=BM_TraceBitmap_Seq/plan01 --benchmark_repetitions=10
2024-02-03T12:41:00+01:00
Running traceup_benchmarks.exe
Run on (16 X 3800 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 512 KiB (x8)
  L3 Unified 32768 KiB (x1)
------------------------------------------------------------------------------------------
Benchmark                                                Time             CPU   Iterations
------------------------------------------------------------------------------------------
BM_TraceBitmap_Seq/plan01/min_time:2.000              8.63 ms         3.39 ms          747
BM_TraceBitmap_Seq/plan01/min_time:2.000              8.60 ms         3.43 ms          747
BM_TraceBitmap_Seq/plan01/min_time:2.000              8.59 ms         5.71 ms          747
BM_TraceBitmap_Seq/plan01/min_time:2.000              8.59 ms         3.79 ms          747
BM_TraceBitmap_Seq/plan01/min_time:2.000              8.66 ms         3.64 ms          747
BM_TraceBitmap_Seq/plan01/min_time:2.000              8.60 ms         5.81 ms          747
BM_TraceBitmap_Seq/plan01/min_time:2.000              8.63 ms         3.51 ms          747
BM_TraceBitmap_Seq/plan01/min_time:2.000              8.60 ms         5.19 ms          747
BM_TraceBitmap_Seq/plan01/min_time:2.000              8.64 ms         3.81 ms          747
BM_TraceBitmap_Seq/plan01/min_time:2.000              8.57 ms         5.86 ms          747
BM_TraceBitmap_Seq/plan01/min_time:2.000_mean         8.61 ms         4.41 ms           10
BM_TraceBitmap_Seq/plan01/min_time:2.000_median       8.60 ms         3.80 ms           10
BM_TraceBitmap_Seq/plan01/min_time:2.000_stddev      0.027 ms         1.08 ms           10
BM_TraceBitmap_Seq/plan01/min_time:2.000_cv           0.31 %         24.49 %            10

Every repetition is reported with the exact same number of iterations, which is highly suspicious. I never get the exact same value in individual runs:

>traceup_benchmarks.exe --benchmark_filter=BM_TraceBitmap_Seq/plan01
2024-02-03T12:45:26+01:00
Running traceup_benchmarks.exe
Run on (16 X 3800 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 512 KiB (x8)
  L3 Unified 32768 KiB (x1)
-----------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations
-----------------------------------------------------------------------------------
BM_TraceBitmap_Seq/plan01/min_time:2.000       8.67 ms         4.15 ms          640

>traceup_benchmarks.exe --benchmark_filter=BM_TraceBitmap_Seq/plan01
2024-02-03T12:45:33+01:00
Running traceup_benchmarks.exe
Run on (16 X 3800 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 512 KiB (x8)
  L3 Unified 32768 KiB (x1)
-----------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations
-----------------------------------------------------------------------------------
BM_TraceBitmap_Seq/plan01/min_time:2.000       8.64 ms         3.49 ms          779

>traceup_benchmarks.exe --benchmark_filter=BM_TraceBitmap_Seq/plan01
2024-02-03T12:45:42+01:00
Running traceup_benchmarks.exe
Run on (16 X 3800 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 512 KiB (x8)
  L3 Unified 32768 KiB (x1)
-----------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations
-----------------------------------------------------------------------------------
BM_TraceBitmap_Seq/plan01/min_time:2.000       8.73 ms         6.16 ms          543

>traceup_benchmarks.exe --benchmark_filter=BM_TraceBitmap_Seq/plan01
2024-02-03T12:45:49+01:00
Running traceup_benchmarks.exe
Run on (16 X 3800 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 512 KiB (x8)
  L3 Unified 32768 KiB (x1)
-----------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations
-----------------------------------------------------------------------------------
BM_TraceBitmap_Seq/plan01/min_time:2.000       8.70 ms         4.57 ms          943

>traceup_benchmarks.exe --benchmark_filter=BM_TraceBitmap_Seq/plan01
2024-02-03T12:46:00+01:00
Running traceup_benchmarks.exe
Run on (16 X 3800 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 512 KiB (x8)
  L3 Unified 32768 KiB (x1)
-----------------------------------------------------------------------------------
Benchmark                                         Time             CPU   Iterations
-----------------------------------------------------------------------------------
BM_TraceBitmap_Seq/plan01/min_time:2.000       8.68 ms         5.23 ms          597

System Which OS, compiler, and compiler version are you using:

Expected behavior Maybe I misunderstood how to use this flag or how to read the data. But what I expected what to see that each repetition reported different values that was similar to when I ran the benchmark manually multiple times.

I also assumed that mean/median etc would report a value of the mean/media of all repetition not the number of repetitions.

LebedevRI commented 5 months ago

The first repetitions picks the iteration count, the other repetitions just keep using the same iteration count.

I also assumed that mean/median etc would report a value of the mean/media of all repetition not the number of repetitions.

It is important to note that within a single repetition, we do not record the individual times each iteration took, only the overall accumulated total time all iterations of an repetition took.

So the aggregate statistics (over repetitions) are correctly saying that they were calculated from (10, in your case) "repetition count", not total number of iterations over all repetitions.

So everything is working as intended.

thomthom commented 4 months ago

I realize I've missed some key fundamentals of this tool. Thank you for the insight and your time to clarify how it works.

I was doing benchmarks while working on performance improvements. I'd set my benchmarks to have a min run time of 2 seconds which seemed fine at the time. Only later when I added parallelization to my code did the results between runs to be more varied. I guess that's because as the code was using more CPU on multiple CPUs I started seeing more noise from the rest of the system.

That lead me to try out repetitions. But all along I was focusing on the iteration count.

Do I understand it correctly that the iterations are not the best to compare between run, or between builds. That instead I should focus on the median run-time of the benchmark itself? And use that as a base to compare performance improvement? (Seeing how iterations get locked to the first run, that metric seem like an unstable one to compare.)

LebedevRI commented 4 months ago

(What you generally want to look at, is the two Time columns.) Unless you specifically know that you need something different, i'd recommend to take a look at the https://github.com/google/benchmark/blob/b7ad5e04972d3bd64ca6a79c931c88848b33e588/docs/tools.md, and try to use benchmark/tools/compare.py script.