google / benchmark

A microbenchmark support library
Apache License 2.0
8.69k stars 1.59k forks source link

[FR] Aggregated statistics for all threads in a benchmark #1604

Open Tasemo opened 1 year ago

Tasemo commented 1 year ago

Is your feature request related to a problem? Please describe. First Issue. I would like to report the speedup (T(1) / T(n)) when using multiple threads with ThreadRange(). There exists ComputeStatistics(), but it only contains values for one specific thread. It seems there is no way to get all values for a single named benchmark. Please correct me if I´m wrong.

Describe the solution you'd like Ideally, there should be a method similiar to ComputeStatistics() that aggregates all values for all threads.

Describe alternatives you've considered Maybe access to the State can be provided in ComputeStatistics(). This way, at least each run can be identified by its thread_index and its name. Values can then be stored externally.

LebedevRI commented 1 year ago

I'm not sure i fully understand the problem. If the desired result is that the (same) custom user counter is accumulated from all threads, then that is the default behavior: https://github.com/google/benchmark/blob/9885aefb96effeb60c4e8c005e7b52c455458c10/test/user_counters_test.cc#L251-L285

Tasemo commented 1 year ago

Sorry, problem is not related to custom counters (at least I think). In the following code snipped, how can I report statistics for the whole "BM_Test" benchmark? It calls the lambda for each individual thread.

BENCHMARK(BM_Test)->ThreadRange(1, 8)->ComputeStatistics("speedup", [](const std::vector<double>& values) {
    // "values" are for one thread only, not the whole "BM_Test" benchmark
}
LebedevRI commented 1 year ago

Hm. Could you please try MeasureProcessCPUTime()?

Tasemo commented 1 year ago

Doesn´t work unfortunately. I think the problem is, that each thread gets treated as an individual benchmark internally. The context of which original benchmark they belong to is lost. The same applies to the Range*() functions and ArgsProduct().

LebedevRI commented 1 year ago

Ok, i don't follow. Could you please post a self-contained (compileable and runnable!) example that demonstrates the problem please?

Tasemo commented 1 year ago

For convenience, a project is available at https://github.com/Tasemo/benchmark-issue-1604.

Code snippet ```cpp #include #include static void BM_StringCopy(benchmark::State& state) { std::string x = "hello"; for (auto _ : state) { std::string copy(x); } } double counter = 0.0; BENCHMARK(BM_StringCopy) ->Repetitions(2) ->ThreadRange(1, 4) ->ComputeStatistics("test", [](const std::vector& values) { std::cout << "Id: " << counter++ << ", Size: " << values.size() << std::endl; return 0.0; }); ```

This code (linked with benchmark_main) prints the following pattern: Id: 0, Size: 2 Id: 1, Size: 2 BM_StringCopy/repeats:2/threads:1 [...] Id: 2, Size: 2 Id: 3, Size: 2 BM_StringCopy/repeats:2/threads:2 [...] Id: 4, Size: 2 Id: 5, Size: 2 BM_StringCopy/repeats:2/threads:4 [...]

Each ComputeStatistics() lambda is called 2 times for each thread (I guess one for CPU time and one for wall time) with the run times for each repetition. I don´t know the best approach, but I would like to get the run times for all threads together so that I can report, for example, how much faster they get compared to using a single thread. One approach I can think of is a function that gets called with a std::vector<std::vector> or any other 2D structure after all runs related to one benchmark have finished.

LebedevRI commented 1 year ago

By "together" would it be enough to get the sum of per-thread times? (as in, do you really want to get separate per-thread values?)

Tasemo commented 1 year ago

No, I need the separate per-thread values in one place. I don´t see another way. The value of one run is dependent on the value of another. If this feature request is too big or too specific, let me know.