#!/bin/bash
# build the program (no special flags are needed)
g++ -std=c++11 cpuload.cpp -o cpuload
# run the program with callgrind; generates a file callgrind.out.12345 that can be viewed with kcachegrind
valgrind --tool=callgrind ./cpuload
# open profile.callgrind with kcachegrind
kcachegrind profile.callgrind
#!/bin/bash
# build the program; For our demo program, we specify -DWITHGPERFTOOLS to enable the gperftools specific #ifdefs
g++ -std=c++11 -DWITHGPERFTOOLS -lprofiler -g ../cpuload.cpp -o cpuload
# run the program; generates the profiling data file (profile.log in our example)
./cpuload
# convert profile.log to callgrind compatible format
pprof --callgrind ./cpuload profile.log > profile.callgrind
# open profile.callgrind with kcachegrind
kcachegrind profile.callgrind
Specifically measure: Using valgrind to measure cache misses
Implement performance comparison with these memory allocators:
hoard (only allocation/deallocation speed)supermalloc (only allocation/deallocation speed)ptmalloc2 (only allocation/deallocation speed) - use latest libcMesh (only allocation/deallocation speed)tcmalloc (only allocation/deallocation speed)Useful links:
Don't forget to use clang stabilizer!
Add PROFILING definition and option to build