I also found that we may need to add the following section in Cargo.toml:
[profile.bench]
debug = true
opt-level = 1
If opt-level is too high, rustc will inline most of the functions and therefore it's hard to see which function is the bottleneck. Not sure if there's better way to do this.
It may be useful to do some profiling on encoding & decoding. We can use existing bench for this. Some useful scripts:
For CPU:
The
stackcollapse-perf
andflamegraph
are from flamegraph.For cache performance (from this thread):
I also found that we may need to add the following section in
Cargo.toml
:If
opt-level
is too high, rustc will inline most of the functions and therefore it's hard to see which function is the bottleneck. Not sure if there's better way to do this.