Performance comparisons (mimalloc, malloc, jemalloc) + cpu/memory Profiling

Specifically measure: Using valgrind to measure cache misses

Implement performance comparison with these memory allocators:

[x] gcpp (only speed gain after compacting)
[x] mimalloc (only allocation/deallocation speed)
[x] rpmalloc (only allocation/deallocation speed)
[x] jemalloc (only allocation/deallocation speed)
[ ] ~~hoard (only allocation/deallocation speed)~~
[ ] ~~supermalloc (only allocation/deallocation speed)~~
[x] ptmalloc3 (only allocation/deallocation speed)
[ ] ~~ptmalloc2 (only allocation/deallocation speed) - use latest libc~~
[x] mem++ (speed gain after compacting + allocation/deallocation speed)
[ ] ~~Mesh (only allocation/deallocation speed)~~
[ ] ~~tcmalloc (only allocation/deallocation speed)~~

Useful links:

Don't forget to use clang stabilizer!

Add PROFILING definition and option to build

#!/bin/bash

# build the program (no special flags are needed)
g++ -std=c++11 cpuload.cpp -o cpuload

# run the program with callgrind; generates a file callgrind.out.12345 that can be viewed with kcachegrind
valgrind --tool=callgrind ./cpuload

# open profile.callgrind with kcachegrind
kcachegrind profile.callgrind

#!/bin/bash

# build the program; For our demo program, we specify -DWITHGPERFTOOLS to enable the gperftools specific #ifdefs
g++ -std=c++11 -DWITHGPERFTOOLS -lprofiler -g ../cpuload.cpp -o cpuload

# run the program; generates the profiling data file (profile.log in our example)
./cpuload

# convert profile.log to callgrind compatible format
pprof --callgrind ./cpuload profile.log > profile.callgrind

# open profile.callgrind with kcachegrind
kcachegrind profile.callgrind

m4drat / memplusplus

Performance comparisons (mimalloc, malloc, jemalloc) + cpu/memory Profiling #29