Evaluate using Profile-Guided Optimization (PGO) and LLVM BOLT

Hi!

Recently I checked Profile-Guided Optimization (PGO) improvements on many projects - all current results are available here. According to multiple tests, PGO can help with improving performance in many cases (including libraries like pydantic-core). Trying to optimize the Tensorflow Text library can be beneficial since it could reduce spent CPU time on routines like text preprocessing.

I can suggest the following action points:

Perform PGO benchmarks on Tensorflow Text. And if it shows improvements - add a note to the documentation about possible improvements in Tensorflow Text performance with PGO.
Providing an easier way (e.g. a build option) to build scripts with PGO can be helpful for the end-users and maintainers since they will be able to optimize Tensorflow Text according to their own workloads if they decide to rebuild Tensorflow Text for their own needs.
Optimize pre-built binaries (if it's possible to prepare or collect a good-enough training workload)

Since the Tensorflow Text native part (C++) is the library, I think the Pydantic-core experience can be reused here — also, Clang supports PGO for shared libraries. I think in this case possible to prepare some text preprocessing routines, collect the PGO profiles from them, and then use them as training PGO data.

Maybe testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO) but I recommend starting from the usual PGO.

Here are some examples of how PGO optimization is integrated in other projects:

Pydantic-core: GitHub PR
Rustc: a CI script for the multi-stage build
GCC:
- Official docs, section "Building with profile feedback" (even AutoFDO build is supported)
- A part in a "wonderful" configure script
Clang: Docs
Python:
- CPython: README
- Pyston: README
Go: Bash script
V8: Bazel flag
ChakraCore: Scripts
Chromium: Script
Firefox: Docs
- Thunderbird has PGO support too
PHP - Makefile command and old Centminmod scripts
MySQL: CMake script
YugabyteDB: GitHub commit
FoundationDB: Script
Zstd: Makefile
Foot: Scripts
Windows Terminal: GitHub PR
file.d: GitHub PR
OceanBase: CMake flag

Many of the examples above are applications but there should be a huge difference - PGO works well with libraries too.

tensorflow / text

Evaluate using Profile-Guided Optimization (PGO) and LLVM BOLT #1227