Open zamazan4ik opened 1 year ago
Note that there is documentation here on how to use the CMake caches. It isn't as detailed as the PGO page, but it shows how to build clang/LLVM with Bolt and the CMake invocation using the caches handles most details automatically.
Following those steps should get you a BOLT-optimized clang with not too much work or even a PGO+ThinLTO+BOLT optimized clang.
@boomanaiden154 Thank you for the link! I think one thing is missing on the mentioned by you page - the performance benefits from BOLT on Clang. So right now isn't clear why I should try to optimize Clang with BOLT. I think would be great to add some performance numbers directly to this guide (like it's already done for PGO instructions with Clang).
If you haven't actual numbers from your own measurements, I guess they could be taken from these slides. In this case, for the users/maintainers will be more motivation to use during the Clang build process.
That's a good point. I'd probably prefer to do fresh measurements since they are liable to change and I'd want to get them on the exact configuration under consideration since things like the specific tasks used for performance training, instrumentation vs sampling for profile collection, and specific versions of the code can make fairly big differences.
It shouldn't take too much effort to generate some performance numbers, just a little bit of time to go through all the benchmarking.
I have created a while ago some results with a static llvm/clang build here: https://github.com/ptr1337/llvm-bolt-scripts/blob/master/results.md
They could vary a bit, since they were tested against llvm 15 and with instrumentation. The performance benefit with LBR should be bigger.
Also a little note: Bolting clang with a shared build (like archlinux and other distributions does provide it) does not make much sense, since the "libLLVM.so" needs to be optimized in shared builds. I also talked with aapuov a bit and he said, that optimizing shared builds makes "not much sense" since they don't focus on performance.
More results on BOLTing Clang from Android project - link.
Here some new results on a ZEN 4 7950X3D: Clang 16 ThinLTO + PGO:
real 3m1.740s
user 87m25.048s
sys 5m31.901s
Clang 16 ThinLTO + PGO + BOLT
real 3m13.146s
user 92m23.847s
sys 5m37.822s
So ca. 7.5 % improvement. The time is the build of the projects clang and llvm.
There is a great article in the official LLVM documentation "How to build Clang and LLVM with Profile-Guided Optimization".
I suggest adding an additional article (or extending the existing one) with information on how to build Clang and LLVM with LLVM Bolt. Clang already supports building with BOLT with CMake scripts.
We need to add the following information to the documentation:
Having this information in the official documentation improves the visibility of the additional way to improve Clang performance with LLVM-native optimization tooling.