Open zamazan4ik opened 1 year ago
Kinda nifty, thanks!
@dormando What do you think about adding information regarding PGO into the Memcached documentation? So users/maintainers will be able to optimize Memcached according to their own workloads.
Here are some examples from other projects:
Hi!
I tested Profile-Guided Optimization (PGO) on Memcached and want to share my results.
Test environment
master
branch (commitefee763c93249358ea5b3b42c7fd4e57e2599c30
)Tested configurations
I have tested the following Memcached configurations (with corresponding
CFLAGS
andLDFLAGS
):CC=clang CFLAGS="-O3" ./configure
CC=clang CFLAGS="-O3 -fprofile-instr-use=memcached.profdata" ./configure
As a PGO technique, I use
-fprofile-instr-generate
/-fprofile-instr-use
options from Clang. Build instrumentedmemcached
version, runmemtier_benchmark
with the instrumentedmemcached
, collect instrumentation data, then rebuildmemcached
again with the collected data.Benchmark
I use
memtier_benchmark
withtaskset -c 1-4 memtier_benchmark –ratio 0:1 -t 4 -c 30 -n 200000 –distinct-client-seed -d 256 –key-maximum 1000000 –hide-histogram –pipeline 30 -p 21789 -P memcache_text
for Instrument and Benchmarking phases.memcached
is started with the commandtaskset -c 0 memcached -p 21789 -t 1
.Results
Here are the results of running the benchmark of different Memcached configurations. All configurations are benchmarked on the same machine, with the same Memcached configuration, multiple times, etc. The results are shown in
memtier_benchmark
format. I have rechecked - the results are consistent between runs.-O3
``` ALL STATS ============================================================================================================================ Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec ---------------------------------------------------------------------------------------------------------------------------- Sets 25641.23 --- --- 0.42276 0.41500 0.82300 0.88700 7384.01 Gets 256409.43 233.66 256175.78 0.42171 0.41500 0.82300 0.88700 6547.85 Waits 0.00 --- --- --- --- --- --- --- Totals 282050.66 233.66 256175.78 0.42180 0.41500 0.82300 0.88700 13931.86 ALL STATS ============================================================================================================================ Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec ---------------------------------------------------------------------------------------------------------------------------- Sets 26243.54 --- --- 0.41591 0.41500 0.81500 0.83900 7557.46 Gets 262432.51 239.30 262193.21 0.41474 0.41500 0.81500 0.83900 6701.70 Waits 0.00 --- --- --- --- --- --- --- Totals 288676.05 239.30 262193.21 0.41485 0.41500 0.81500 0.83900 14259.16 ALL STATS ============================================================================================================================ Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec ---------------------------------------------------------------------------------------------------------------------------- Sets 26421.20 --- --- 0.41497 0.41500 0.81500 0.86300 7608.63 Gets 264209.12 240.98 263968.14 0.41378 0.40700 0.80700 0.86300 6747.09 Waits 0.00 --- --- --- --- --- --- --- Totals 290630.32 240.98 263968.14 0.41389 0.40700 0.80700 0.86300 14355.71 ALL STATS ============================================================================================================================ Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec ---------------------------------------------------------------------------------------------------------------------------- Sets 26630.51 --- --- 0.41111 0.40700 0.80700 0.83900 7668.90 Gets 266302.20 242.84 266059.36 0.41034 0.40700 0.80700 0.83900 6800.52 Waits 0.00 --- --- --- --- --- --- --- Totals 292932.71 242.84 266059.36 0.41041 0.40700 0.80700 0.83900 14469.43 ```-O3 + PGO
``` ALL STATS ============================================================================================================================ Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec ---------------------------------------------------------------------------------------------------------------------------- Sets 27202.89 --- --- 0.40124 0.39900 0.79100 0.81500 7833.73 Gets 272025.88 248.30 271777.58 0.40043 0.39900 0.79100 0.81500 6946.76 Waits 0.00 --- --- --- --- --- --- --- Totals 299228.77 248.30 271777.58 0.40051 0.39900 0.79100 0.81500 14780.49 ALL STATS ============================================================================================================================ Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec ---------------------------------------------------------------------------------------------------------------------------- Sets 27445.08 --- --- 0.39962 0.39900 0.78300 0.81500 7903.48 Gets 274447.81 250.38 274197.43 0.39888 0.39900 0.78300 0.81500 7008.57 Waits 0.00 --- --- --- --- --- --- --- Totals 301892.89 250.38 274197.43 0.39895 0.39900 0.78300 0.81500 14912.05 ALL STATS ============================================================================================================================ Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec ---------------------------------------------------------------------------------------------------------------------------- Sets 27204.11 --- --- 0.40191 0.39900 0.78300 0.81500 7834.08 Gets 272038.14 247.41 271790.73 0.40070 0.39900 0.78300 0.81500 6946.82 Waits 0.00 --- --- --- --- --- --- --- Totals 299242.26 247.41 271790.73 0.40081 0.39900 0.78300 0.81500 14780.90 ALL STATS ============================================================================================================================ Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec ---------------------------------------------------------------------------------------------------------------------------- Sets 27439.44 --- --- 0.40177 0.39900 0.78300 0.81500 7901.85 Gets 274391.37 251.01 274140.36 0.40058 0.39900 0.78300 0.81500 7007.32 Waits 0.00 --- --- --- --- --- --- --- Totals 301830.80 251.01 274140.36 0.40069 0.39900 0.78300 0.81500 14909.17 ALL STATS ============================================================================================================================ Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec ---------------------------------------------------------------------------------------------------------------------------- Sets 27415.53 --- --- 0.40157 0.39900 0.79100 0.81500 7894.97 Gets 274152.28 250.20 273902.08 0.40053 0.39900 0.79100 0.81500 7001.05 Waits 0.00 --- --- --- --- --- --- --- Totals 301567.81 250.20 273902.08 0.40063 0.39900 0.79100 0.81500 14896.01 ```I didn't test (and profiled) other
memtier_benchmark
profiles (since I am not much familiar with the tool), maybe somewhere results are better (or worse - who knows). Maybe BOLT (llvm-bolt
) can help to achieve even more performance - also didn't test it.More about other PGO results (e.g. for Redis) you can find here.