google / benchmark

A microbenchmark support library
Apache License 2.0
8.6k stars 1.57k forks source link

[FR] Add normalizing factor to the results #1659

Closed HFTrader closed 10 months ago

HFTrader commented 10 months ago

Is your feature request related to a problem? Please describe.

I have a benchmark where I measure one operation on a vector, say sort a vector of size N. Assume this operations is O(N) so I get as a result a list of numbers that vary according to the size of the vector, which is a parameter/argument to the test.

A typical test/benchmark would be like the following:

static void BM_StdVector_Sequential(benchmark::State& state) {
    const std::size_t PAGESIZE = state.range(1);
    const std::size_t VECTORSIZE = state.range(0);
    std::vector<TestObject> pg(VECTORSIZE);
    for (TestObject& obj : pg) {
        obj.value = 1;
    }
    for (auto _ : state) {
        for (std::size_t j = 0; j < VECTORSIZE; ++j) {
            pg[j].value += 1;
        }
    }
    benchmark::DoNotOptimize(pg);
}
BENCHMARK(BM_PagedListVector_Sequential)
    ->Args({1024 * 1024, 65536})
    ->Args({1024 * 1024, 512 * 1024})
    ->Args({1024 * 1024, 1024 * 1024})
    ->Args({4096, 1024})
    ->Args({4096, 4096});

A typical result would be as below.

-----------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations
-----------------------------------------------------------------------------------------
BM_PagedListVector_Sequential/1048576/65536       4742341 ns      4732952 ns          146
BM_PagedListVector_Sequential/1048576/524288       697148 ns       696902 ns         1005
BM_PagedListVector_Sequential/1048576/1048576      561080 ns       560928 ns         1246
BM_PagedListVector_Sequential/4096/1024              3673 ns         3672 ns       175320
BM_PagedListVector_Sequential/4096/4096              2167 ns         2167 ns       323286
BM_PagedVector_Sequential/1048576/65536            624387 ns       624144 ns         1124
BM_PagedVector_Sequential/1048576/524288           624112 ns       622993 ns         1125
BM_PagedVector_Sequential/1048576/1048576          621620 ns       621428 ns         1124
BM_PagedVector_Sequential/4096/1024                  2410 ns         2410 ns       290539
BM_PagedVector_Sequential/4096/4096                  2410 ns         2409 ns       290442
BM_StdVector_Sequential/1048576/65536               93975 ns        93941 ns         7442
BM_StdVector_Sequential/1048576/524288              94051 ns        94024 ns         7430
BM_StdVector_Sequential/1048576/1048576             94185 ns        94067 ns         7443
BM_StdVector_Sequential/4096/1024                     273 ns          273 ns      2569816
BM_StdVector_Sequential/4096/4096                     273 ns          273 ns      2564177

To interpret these numbers I'm always doing math in my head like 4742341/1048576 =4.74ns/value or 3673/4096 =0.89ns/value. This allows me to see all the secondary effects (cache and algorithm impact), which I'm after.

Describe the solution you'd like

Ideally I would like to see the metric numbers already normalized by a factor provided by me. Currently the user counter interface does not allow for a user-provided value, only cpu time, number of threads and number of iterations. I am not sure exactly what the proper interface would be as I think this would require the bench test to input such value into the counter object.

Describe alternatives you've considered

  1. The current alternative is to compute all required values in my head.

  2. I could also write a python script to parse the output and print the normalized values

  3. I could create a user counter and compute all the values manually, retrieving the actual cpu time from the state object and dividing manually by the factor I want.

Additional context

This is a use case that's so common for me that I wonder if it's also common for other users. Did this come up at any point?

I might also very well be missing an API or usage here. Please let me know :)

LebedevRI commented 10 months ago

I highly suspect this is already achievable through custom user counters:

state.counters["ns_per_val"] = bm::Counter{state.range(0), bm::Counter::kIsRate | bm::Counter::kInvert};
HFTrader commented 10 months ago

Hi Roman,

The kIsRate will normalize by cpu_time only (counter.cc:20):

image

LebedevRI commented 10 months ago

And?

HFTrader commented 10 months ago

Oh nv mind I see what you did there. You passed the vector size and that will be divided by the cpu time, then inverted. Let me check? That might be it.

HFTrader commented 10 months ago

I know you're going to tell me to look into the code but I'm puzzled now. What is the value that gets fed to Finish()?

This below does not make any sense. Values should be 0.68 ns/sample for 1M and 0.58 ns/sample for 4k.

image

LebedevRI commented 10 months ago

Right, how about:

state.counters["ns_per_val"] = bm::Counter{state.iterations() * state.range(0), bm::Counter::kIsRate | bm::Counter::kInvert};

aka

state.counters["ns_per_val"] = bm::Counter{state.range(0), bm::Counter::kIsIterationInvariantRate | bm::Counter::kInvert};

?

HFTrader commented 10 months ago

That did it. Perhaps add that to the documentation? I don't know why it works but it does.

image

LebedevRI commented 10 months ago

Nice!

I was hoping that https://github.com/google/benchmark/blob/main/docs/user_guide.md#custom-counters was already clearly explainng that, but if you have a suggestion on a particular wording improvements, please feel free to open a pull request...