Recently I checked Profile-Guided Optimization (PGO) improvements on many projects - all current results are available here. According to multiple tests, PGO can help with improving performance in many cases (including libraries like pydantic-core). Trying to optimize the Tensorflow Text library can be beneficial since it could reduce spent CPU time on routines like text preprocessing.
I can suggest the following action points:
Perform PGO benchmarks on Tensorflow Text. And if it shows improvements - add a note to the documentation about possible improvements in Tensorflow Text performance with PGO.
Providing an easier way (e.g. a build option) to build scripts with PGO can be helpful for the end-users and maintainers since they will be able to optimize Tensorflow Text according to their own workloads if they decide to rebuild Tensorflow Text for their own needs.
Optimize pre-built binaries (if it's possible to prepare or collect a good-enough training workload)
Since the Tensorflow Text native part (C++) is the library, I think the Pydantic-core experience can be reused here — also, Clang supports PGO for shared libraries. I think in this case possible to prepare some text preprocessing routines, collect the PGO profiles from them, and then use them as training PGO data.
Maybe testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO) but I recommend starting from the usual PGO.
Here are some examples of how PGO optimization is integrated in other projects:
Hi!
Recently I checked Profile-Guided Optimization (PGO) improvements on many projects - all current results are available here. According to multiple tests, PGO can help with improving performance in many cases (including libraries like pydantic-core). Trying to optimize the Tensorflow Text library can be beneficial since it could reduce spent CPU time on routines like text preprocessing.
I can suggest the following action points:
Since the Tensorflow Text native part (C++) is the library, I think the Pydantic-core experience can be reused here — also, Clang supports PGO for shared libraries. I think in this case possible to prepare some text preprocessing routines, collect the PGO profiles from them, and then use them as training PGO data.
Maybe testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO) but I recommend starting from the usual PGO.
Here are some examples of how PGO optimization is integrated in other projects:
configure
scriptMany of the examples above are applications but there should be a huge difference - PGO works well with libraries too.