google / benchmark

A microbenchmark support library
Apache License 2.0
9.04k stars 1.63k forks source link

Memory reporting capabilities are unclear and undocumented. #1217

Open haberman opened 3 years ago

haberman commented 3 years ago

I am currently using google/benchmark to benchmark https://github.com/protocolbuffers/upb. I would very much like for those benchmarks to also include memory usage.

Does this project support memory benchmarking? The user guide does not mention memory usage benchmarking at all. However I notice that the header has a RegisterMemoryManager() function, and there is a unit test verifying this functionality.

From running the test, it appears that the memory usage data is surfaced to JSON, but not CSV or console reports. Is this an oversight or intentional? Ideally this data would be shown on CSV and console reports also.

oontvoo commented 3 years ago

@dominichamon Any thought on this? (I've similar question to @haberman as I'm thinking about what to do with the internal BenchmakMemoryUsage() )

dmah42 commented 3 years ago

this was already added some time ago to align with the internal usage. The above note mentions RegisterMemoryManager and the idea is to use that to plug memory tracking in (because we didn't want to make assumptions about available memory management systems like tcmalloc).

outputting to json only was a matter of priority, not deliberately avoiding other outputs.

dmah42 commented 2 years ago

i'll leave this open to cover the addition of the memory reporting to CSV/Console reporters.

q-ycong-p commented 2 years ago

There is a section in User Guide that briefly mentions the supported functionality of benchmarking memory usage. But the documentation gives no concrete instruction of how to use it. Please provide details and examples.

ominil commented 2 years ago

The following is an example that I created to try the MemoryManager functionality. I'm interested to know how many bytes are allocated during a single benchmark execution.

I encountered a problem and I'm not sure if what I'm doing is right as there is no documentation.

Steps:

#include <memory>
#include <benchmark/benchmark.h>

class CustomMemoryManager: public benchmark::MemoryManager {
public:

    int64_t num_allocs;
    int64_t max_bytes_used;

    void Start() BENCHMARK_OVERRIDE {
        num_allocs = 0;
        max_bytes_used = 0;
    }

    void Stop(Result* result) BENCHMARK_OVERRIDE {
        result->num_allocs = num_allocs;
        result->max_bytes_used = max_bytes_used;
    }
};

std::unique_ptr<CustomMemoryManager> mm(new CustomMemoryManager());

#ifdef MEMORY_PROFILER
void *custom_malloc(size_t size) {
    void *p = malloc(size);
    mm.get()->num_allocs += 1;
    mm.get()->max_bytes_used += size;
    return p;
}
#define malloc(size) custom_malloc(size)
#endif

static void BM_memory(benchmark::State& state) {
    for (auto _ : state)
        for (int i =0; i < 10; i++) {
            benchmark::DoNotOptimize((int *) malloc(10 * sizeof(int *)));
        }
}

BENCHMARK(BM_memory)->Unit(benchmark::kMillisecond)->Iterations(17);

//BENCHMARK_MAIN();
int main(int argc, char** argv)
{
    ::benchmark::RegisterMemoryManager(mm.get());
    ::benchmark::Initialize(&argc, argv);
    ::benchmark::RunSpecifiedBenchmarks();
    ::benchmark::RegisterMemoryManager(nullptr);
}

To compile it run:

g++ src/memory_benchmark.cpp   -std=c++11 -O3 -isystem benchmark/include -Lbenchmark/build/src -lbenchmark -lpthread -DMEMORY_PROFILER -o bin/benchmark_memory.exe

The you simple run:

./benchmark_memory.exe --benchmark_format=json

It is important to wrap the custom_malloc function inside a conditional inclusion (e.g., #ifdef), as collecting the memory could worsen the performance result. Therefore if you don't define MEMORY_PROFILER you will use the default malloc with stable results.

If I ran the benchmark with a single iteration I got expected result:

However from 17 or higher iterations max_bytes_used is always 12800.

Can anyone tell me if what I'm doing is right? And if yes I hope this code can help other fellow developers!

LebedevRI commented 2 years ago

https://github.com/google/benchmark/blob/7eb8c0fe45c2893f13863f2fa317c46db0336c4e/src/benchmark_runner.cc#L378-L380

dmah42 commented 2 years ago

apologies: closed as the above is resolved but the broader issue of reporting in other reporters isn't.

varshneydevansh commented 1 year ago

Can I look into this? This seems like a little similar to the issue we just closed. @dmah42

dmah42 commented 1 year ago

yes of course.

thatsafunnyname commented 1 year ago

I use custom counters [ docs ] instead of benchmark::MemoryManager [ src , docs ]. This enabled more detailed stats (using the m*map and sbrk hooks), and console tabular output without having to parse JSON:

./a.out --benchmark_counters_tabular=true
---------------------------------------------------------------------------------------------------------------------
Benchmark                    Time             CPU   Iterations       #new  avg_new_B  max_new_B  min_new_B  sum_new_B
---------------------------------------------------------------------------------------------------------------------
BM_demo/32/128/32          118 ns          117 ns      5931457          3         64        128         32        192
BM_demo/320/640/960        117 ns          117 ns      5985820          3        640        960        320      1.92k

This brief example is using tcmalloc. Adding it here in the hope it is useful to someone. It does not support running multithreaded benchmarks.

#include <cstring> // malloc
#include <gperftools/malloc_hook.h> // link tcmalloc
#include "benchmark/benchmark.h" // link benchmark

benchmark::IterationCount g_num_new      = 0;
benchmark::IterationCount g_sum_size_new = 0;
benchmark::IterationCount g_max_size_new = 0;
benchmark::IterationCount g_min_size_new = -1;
auto new_hook = [](const void*, size_t size){ ++g_num_new; g_sum_size_new += size;
                                              g_max_size_new = std::max(g_max_size_new, size);
                                              g_min_size_new = std::min(g_min_size_new, size); };
#define BEFORE_TEST \
  benchmark::IterationCount num_new      = g_num_new;\
  benchmark::IterationCount sum_size_new = g_sum_size_new;\
  g_max_size_new = 0;\
  g_min_size_new = -1;\
  MallocHook::AddNewHook( new_hook );

#define AFTER_TEST \
  MallocHook::RemoveNewHook( new_hook );\
  auto iter = state.iterations();\
  state.counters["#new"]      = (g_num_new      - num_new)      / iter;\
  state.counters["sum_new_B"] = (g_sum_size_new - sum_size_new) / iter;\
  state.counters["avg_new_B"] = (g_sum_size_new - sum_size_new) / (g_num_new - num_new);\
  state.counters["max_new_B"] = g_max_size_new;\
  if( ((benchmark::IterationCount)-1) != g_min_size_new ){\
    state.counters["min_new_B"] = g_min_size_new;\
  }

static void BM_demo(benchmark::State& state) {
  BEFORE_TEST
  for (auto _ : state) {
    void* ret1 = malloc(state.range(0));
    void* ret2 = malloc(state.range(1));
    void* ret3 = malloc(state.range(2));
    free(ret1);
    free(ret2);
    free(ret3);
  }
  AFTER_TEST
}
BENCHMARK(BM_demo)->Args({32,128,32});
BENCHMARK(BM_demo)->Args({320,640,960});
BENCHMARK_MAIN();
dmah42 commented 1 year ago

this is great, thank you so much. the memory management API is in place largely for backwards compatibility with an older version of the library but this is a really neat way to bring more complete stats into the output.

thanks for sharing!