Closed mbautin closed 2 years ago
A preliminary/naive single-node performance test of a yb-tserver binary built with Clang 12 link-time optimization (LTO) shows about 30% throughput improvement on a write-only CassandraKeyValue workload. LTO works by putting LLVM bitcode into .o files instead of native code, and at link time, the entire program is loaded into memory and optimized as a whole. E.g. this allows better inlining, devirtualization (replacing virtual function calls with direct calls in case the class is known at compile time), etc. For a dynamically linked program (the way we build code today), these are not possible because in theory any function could be replaced by a different implementation, e.g. through LD_PRELOAD. https://gist.githubusercontent.com/mbautin/6d2debaef1286aa045afde0c08853760/raw -- and linking the remaining shared library statically could be even better ( right now a few libraries are still dynamically linked: https://gist.githubusercontent.com/mbautin/bc8769a9ae93f8d6d1f2244591a08376/raw ). Potentially we could even link statically with glibc (we'll have to rebuild it). On the flip side, this statically linked binary is 480 MB with debug info (but only 61 MB without it). I was thinking of creating a "busybox style" binary (busybox is a single executable that provides lots of Unix utilities -- https://en.wikipedia.org/wiki/BusyBox ). So we could create one binary that can be yb-master, yb-tserver, or postgres, depending on argv[0]. And the rest of the tools in our release tarball could still use dynamic linking the same way they do today.
Enabled for yb-tserver. Will create follow-up issues for doing more LTO for yb-master and postgres.
It would probably be very good for YugabyteDB performance to compile all YB + postgres code using Clang's LTO (link time optimization).
Preliminary results produced with Clang 12, thin LTO, x86_64, using the Linuxbrew glibc and other libraries. First and third experiments are on the LTO build, the middle one is the non-LTO release build.
The build is done with
-fwhole-program
, and YB + postgres code are currently included in LTO (could also include third-party libraries). Only yb-tserver is compiled with LTO.