memcached / memcached

memcached development tree
https://memcached.org
BSD 3-Clause "New" or "Revised" License
13.44k stars 3.27k forks source link

Profile-Guided Optimization (PGO) on Memcached #1054

Open zamazan4ik opened 1 year ago

zamazan4ik commented 1 year ago

Hi!

I tested Profile-Guided Optimization (PGO) on Memcached and want to share my results.

Test environment

Tested configurations

I have tested the following Memcached configurations (with corresponding CFLAGS and LDFLAGS):

As a PGO technique, I use -fprofile-instr-generate/-fprofile-instr-use options from Clang. Build instrumented memcached version, run memtier_benchmark with the instrumented memcached, collect instrumentation data, then rebuild memcached again with the collected data.

Benchmark

I use memtier_benchmark with taskset -c 1-4 memtier_benchmark –ratio 0:1 -t 4 -c 30 -n 200000 –distinct-client-seed -d 256 –key-maximum 1000000 –hide-histogram –pipeline 30 -p 21789 -P memcache_text for Instrument and Benchmarking phases. memcached is started with the command taskset -c 0 memcached -p 21789 -t 1 .

Results

Here are the results of running the benchmark of different Memcached configurations. All configurations are benchmarked on the same machine, with the same Memcached configuration, multiple times, etc. The results are shown in memtier_benchmark format. I have rechecked - the results are consistent between runs.

-O3 ``` ALL STATS ============================================================================================================================ Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec ---------------------------------------------------------------------------------------------------------------------------- Sets 25641.23 --- --- 0.42276 0.41500 0.82300 0.88700 7384.01 Gets 256409.43 233.66 256175.78 0.42171 0.41500 0.82300 0.88700 6547.85 Waits 0.00 --- --- --- --- --- --- --- Totals 282050.66 233.66 256175.78 0.42180 0.41500 0.82300 0.88700 13931.86 ALL STATS ============================================================================================================================ Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec ---------------------------------------------------------------------------------------------------------------------------- Sets 26243.54 --- --- 0.41591 0.41500 0.81500 0.83900 7557.46 Gets 262432.51 239.30 262193.21 0.41474 0.41500 0.81500 0.83900 6701.70 Waits 0.00 --- --- --- --- --- --- --- Totals 288676.05 239.30 262193.21 0.41485 0.41500 0.81500 0.83900 14259.16 ALL STATS ============================================================================================================================ Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec ---------------------------------------------------------------------------------------------------------------------------- Sets 26421.20 --- --- 0.41497 0.41500 0.81500 0.86300 7608.63 Gets 264209.12 240.98 263968.14 0.41378 0.40700 0.80700 0.86300 6747.09 Waits 0.00 --- --- --- --- --- --- --- Totals 290630.32 240.98 263968.14 0.41389 0.40700 0.80700 0.86300 14355.71 ALL STATS ============================================================================================================================ Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec ---------------------------------------------------------------------------------------------------------------------------- Sets 26630.51 --- --- 0.41111 0.40700 0.80700 0.83900 7668.90 Gets 266302.20 242.84 266059.36 0.41034 0.40700 0.80700 0.83900 6800.52 Waits 0.00 --- --- --- --- --- --- --- Totals 292932.71 242.84 266059.36 0.41041 0.40700 0.80700 0.83900 14469.43 ```
-O3 + PGO ``` ALL STATS ============================================================================================================================ Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec ---------------------------------------------------------------------------------------------------------------------------- Sets 27202.89 --- --- 0.40124 0.39900 0.79100 0.81500 7833.73 Gets 272025.88 248.30 271777.58 0.40043 0.39900 0.79100 0.81500 6946.76 Waits 0.00 --- --- --- --- --- --- --- Totals 299228.77 248.30 271777.58 0.40051 0.39900 0.79100 0.81500 14780.49 ALL STATS ============================================================================================================================ Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec ---------------------------------------------------------------------------------------------------------------------------- Sets 27445.08 --- --- 0.39962 0.39900 0.78300 0.81500 7903.48 Gets 274447.81 250.38 274197.43 0.39888 0.39900 0.78300 0.81500 7008.57 Waits 0.00 --- --- --- --- --- --- --- Totals 301892.89 250.38 274197.43 0.39895 0.39900 0.78300 0.81500 14912.05 ALL STATS ============================================================================================================================ Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec ---------------------------------------------------------------------------------------------------------------------------- Sets 27204.11 --- --- 0.40191 0.39900 0.78300 0.81500 7834.08 Gets 272038.14 247.41 271790.73 0.40070 0.39900 0.78300 0.81500 6946.82 Waits 0.00 --- --- --- --- --- --- --- Totals 299242.26 247.41 271790.73 0.40081 0.39900 0.78300 0.81500 14780.90 ALL STATS ============================================================================================================================ Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec ---------------------------------------------------------------------------------------------------------------------------- Sets 27439.44 --- --- 0.40177 0.39900 0.78300 0.81500 7901.85 Gets 274391.37 251.01 274140.36 0.40058 0.39900 0.78300 0.81500 7007.32 Waits 0.00 --- --- --- --- --- --- --- Totals 301830.80 251.01 274140.36 0.40069 0.39900 0.78300 0.81500 14909.17 ALL STATS ============================================================================================================================ Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec ---------------------------------------------------------------------------------------------------------------------------- Sets 27415.53 --- --- 0.40157 0.39900 0.79100 0.81500 7894.97 Gets 274152.28 250.20 273902.08 0.40053 0.39900 0.79100 0.81500 7001.05 Waits 0.00 --- --- --- --- --- --- --- Totals 301567.81 250.20 273902.08 0.40063 0.39900 0.79100 0.81500 14896.01 ```

I didn't test (and profiled) other memtier_benchmark profiles (since I am not much familiar with the tool), maybe somewhere results are better (or worse - who knows). Maybe BOLT (llvm-bolt) can help to achieve even more performance - also didn't test it.

More about other PGO results (e.g. for Redis) you can find here.

dormando commented 1 year ago

Kinda nifty, thanks!

zamazan4ik commented 1 year ago

@dormando What do you think about adding information regarding PGO into the Memcached documentation? So users/maintainers will be able to optimize Memcached according to their own workloads.

Here are some examples from other projects: