google / benchmark

A microbenchmark support library
Apache License 2.0
8.59k stars 1.57k forks source link

[Q] Need help with understanding the results #1771

Open evskola opened 3 months ago

evskola commented 3 months ago

Hi there,

I did a benchmark for cv::cvtColor (in essense it loops through a set of images and does conversion BayerRG => BGR).

Run on (24 X 2112 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x12)
  L1 Instruction 32 KiB (x12)
  L2 Unified 2048 KiB (x12)
  L3 Unified 30720 KiB (x1)
--------------------------------------------------------------------------
Benchmark                                Time             CPU   Iterations
--------------------------------------------------------------------------
BM_ConvertBayerRG8/threads:1        141345 ns       142299 ns         5600
BM_ConvertBayerRG8/threads:2        102112 ns       207203 ns         3318
BM_ConvertBayerRG8/threads:4        143496 ns       496779 ns         2988
BM_ConvertBayerRG8/threads:8        258000 ns       898438 ns          800
BM_ConvertBayerRG8EA/threads:1      169186 ns       174386 ns         4480
BM_ConvertBayerRG8EA/threads:2      129333 ns       204041 ns         3446
BM_ConvertBayerRG8EA/threads:4      175315 ns       592913 ns         2240
BM_ConvertBayerRG8EA/threads:8      287515 ns      1063756 ns          896
BM_ConvertBayerRG8VNG/threads:1   14805108 ns     14409722 ns           90
BM_ConvertBayerRG8VNG/threads:2   13547418 ns     26562500 ns           20
BM_ConvertBayerRG8VNG/threads:4    7017534 ns     25000000 ns           40
BM_ConvertBayerRG8VNG/threads:8    3459721 ns     20507813 ns           80

And BM_ConvertBayerRG8VNG results look weird to me. How it comes that running benchmark on 4, 8 threads gives lesser times than running on a single thread?

Images (2046*2046) are of 4mb in size (the result image is 12mb). The set in total is of 33 files.

LebedevRI commented 3 months ago

Because the time is divided by number of threads: https://github.com/google/benchmark/blob/c64b144f42f7e17bfebd3d2220f8daac48e6365c/src/benchmark_runner.cc#L290-L292 This is a duplicate of #769 / #946. I'm not sure why the things the way they are, and i don't know if they should be changed.