Fleetbench is a benchmarking suite for Google workloads. It's a portmanteau of "fleet" and "benchmark". It is meant for use by chip vendors, compiler researchers, and others interested in making performance optimizations beneficial to workloads similar to Google's. This repository contains the Fleetbench C++ code.
Details on Fleetbench can be found in our paper A Profiling-Based Benchmark Suite for Warehouse-Scale Computers.
NOTE: As this project is evolving, we recommend including the tag/release number when citing it to avoid any confusion.
Fleetbench is a benchmarking suite that consists of a curated set of microbenchmarks for hot functions across Google's fleet. The data set distributions it uses for executing the benchmarks are derived from data collected in production.
IMPORTANT: This benchmark at v1.0.0
represents subset of core libraries used
across the fleet. Future releases will continue to increase this coverage. The
goal is to expand coverage iteratively and keep distributions up-to-date, so
always use its version at HEAD
.
For more information, see:
Benchmark fidelity is an important consideration in building this suite. There are 3 levels of fidelity that we consider:
Fleetbench uses semantic versioning for its releases, where
PATCH
versions will be used for bug fixes, MINOR
for updates to
distributions and category coverage, and MAJOR
for substantial changes to the
benchmarking suite. All releases will be tagged, and the suite can be built and
run at any version of the tag.
If you're starting out, authors recommend you always use the latest version at HEAD only.
As of Q2'24, Fleetbench provides coverage for several major hot functions.
Benchmark | Description |
---|---|
Proto | Instruction-focused. |
Swissmap | Data-focused. |
Libc | Data-focused. |
TCMalloc | Data-focused. |
Compression | Data-focused. Covers Snappy, ZSTD, Brotli, and Zlib. |
Hashing | Data-focused. Supports algorithms CRC32 and absl::Hash. |
STL-Cord | Instruction-focused. |
Bazel is the official build system for Fleetbench.
We support Bazel version 6 and 7.
NOTE: Our setup uses Bazel 7.0.1 and LLVM 17.0.1.
As an example, to run the Swissmap benchmarks:
bazel run --config=opt fleetbench/swissmap:swissmap_benchmark
Important: Always run benchmarks with --config=opt
to apply essential compiler
optimizations.
Replacing the $WORK_LOAD
and $BUILD_TARGET
with one of the entry in the
table to build and run the benchmark. The reasons why we add each build flag are
explained in the next few sections.
Benchmark | WORKLOAD | BUILD_TARGET | Binary run flags |
---|---|---|---|
Proto | proto | proto_benchmark | --benchmark_min_time=3s |
Swissmap | swissmap | swissmap_benchmark | |
Libc memory | libc | mem_benchmark | --benchmark_counters_tabular=true |
TCMalloc | tcmalloc | empirical_driver | --benchmark_min_time=10s . Check --benchmark_filter below. |
Compression | compression | compression_benchmark | --benchmark_counters_tabular=true |
Hashing | hashing | hashing_benchmark | --benchmark_counters_tabular=true |
STL-Cord | stl | cord_benchmark |
NOTE: By default, each benchmark only runs a minimal set of tests that we have
selected as the most representative. To see the default lists, you can use the
--benchmark_list_tests
flag when running the target. You can add
--benchmark_filter=all
to see the exhaustive list.
You can also specify a regex in --benchmark_filter
flag to specify a subset of
benchmarks to run
(more info).
The TCMalloc Empirical Driver benchmark can take ~1hr to run all benchmarks, so
running a subset may be advised.
Example to run for only sets of 16
and 64
elements of swissmap:
bazel run --config=opt fleetbench/swissmap:swissmap_benchmark -- \
--benchmark_filter=".*set_size:(16|64).*"
To extend the runtime of a benchmark, e.g. to collect more profile samples, use
--benchmark_min_time
.
bazel run --config=opt fleetbench/proto:proto_benchmark -- --benchmark_min_time=30s
Some benchmarks also provide counter reports after completion. Adding
--benchmark_counters_tabular=true
(doc)
can help print counters as table columns for improved layout.
TCMalloc is the underlying memory allocator in this benchmark suite. By default it operates in per-CPU mode.
Note: the Restartable Sequences (RSEQ)
kernel feature is required for per-CPU mode. RSEQ has the limitation that a
given thread can only register a single rseq
structure with the kernel. Recent
versions of glibc do this on initialization,
preventing TCMalloc from using
it.
Set the environment variable: GLIBC_TUNABLES=glibc.pthread.rseq=0
to prevent
glibc from doing this registration. This will allow TCMalloc to operate in
per-CPU mode.
For more consistency with Google's build configuration, we suggest using the Clang / LLVM tools. These instructions have been tested with LLVM 14.
These can be installed with the system's package manager, e.g. on Debian:
sudo apt-get install clang llvm lld
Otherwise, see https://releases.llvm.org to obtain these if not present on your system or to find the newest version.
Once installed, specify --config=clang
on the bazel command line to use the
clang compiler. We assume clang
and lld
are in the PATH.
Note: to make this setting the default, add build --config=clang
to your
.bazelrc.
If running on an x86 Haswell or above machine, we suggest adding
--config=haswell
for consistency with our compiler flags.
Use --config=westmere
for Westmere-era processors, and --config=arm
for ARM
ones.
It is expected that there will be some variance in the reported CPU times across benchmark executions. The benchmark itself runs the same code, so the causes of the variance are mainly in the environment. The following is a non-exhaustive list of techniques that help with reducing run-to-run latency variance:
--benchmark_min_time
.--benchmark_repetitions
.Potential areas of future work include:
Q: How do I compare results of the two different runs of a benchmark, e.g. contender vs baseline?
A: Fleetbench is using the benchmark framework. Please reference its documentation for comparing results across benchmark runs: link.
Q: How do I build the benchmark with FDO?
A: Note that Clang and the LLVM tools are required for FDO builds.
Take fleetbench/swissmap/swissmap_benchmark as an example.
# Instrument.
bazel build --config=clang --config=opt --fdo_instrument=.fdo fleetbench/swissmap:swissmap_benchmark
# Run to generate instrumentation.
bazel-bin/fleetbench/swissmap/swissmap_benchmark --benchmark_filter=all
# There should be a file with a .profraw extension in $PWD/.fdo/.
# Build an optimized binary.
bazel build --config=clang --config=opt --fdo_optimize=.fdo/<filename>.profraw fleetbench/swissmap:swissmap_benchmark
# Run the FDO-optimized binary.
bazel-bin/fleetbench/swissmap/swissmap_benchmark --benchmark_filter=all
Q: How do I build the benchmark with ThinLTO?
A: Note that Clang and the LLVM tools are required for ThinLTO builds. In
particular, the lld
linker must be in the PATH. Specify
--features=thin_lto
on the bazel command line. E.g.
bazel run --config=clang --config=opt --features=thin_lto fleetbench/proto:proto_benchmark
Q: Does Fleetbench run on _ OS?
A: The supported platforms are same as TCMalloc's, see link for more details.
Q: Can I run Fleetbench without TCMalloc?
A: Yes. Specify --custom_malloc="@bazel_tools//tools/cpp:malloc"
on the
bazel command line to override with the system allocator.
Q: Can I run with Address Sanitizer?
A: Yes. Note that you need to override TCMalloc as well for ASAN to work.
Example:
bazel build --custom_malloc="@bazel_tools//tools/cpp:malloc" -c opt fleetbench/proto:proto_benchmark --copt=-fsanitize=address --linkopt=-fsanitize=address
Q: Are the benchmarks fixed in nature?
A: No. It is our expectation that the code under benchmark, the hardware, the compiler, and compiler flags used may all change in concert as to identify optimization opportunities.
Q: My question isn't addressed here. How do I contact the development team?
A: Please see previous GH issues and file a new one, if your question isn't addressed there.
Fleetbench is licensed under the terms of the Apache license. See LICENSE for more information.
Disclaimer: This is not an officially supported Google product.