llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.61k stars 11.82k forks source link

Document building Clang with BOLT #65010

Open zamazan4ik opened 1 year ago

zamazan4ik commented 1 year ago

There is a great article in the official LLVM documentation "How to build Clang and LLVM with Profile-Guided Optimization".

I suggest adding an additional article (or extending the existing one) with information on how to build Clang and LLVM with LLVM Bolt. Clang already supports building with BOLT with CMake scripts.

We need to add the following information to the documentation:

Having this information in the official documentation improves the visibility of the additional way to improve Clang performance with LLVM-native optimization tooling.

boomanaiden154 commented 1 year ago

Note that there is documentation here on how to use the CMake caches. It isn't as detailed as the PGO page, but it shows how to build clang/LLVM with Bolt and the CMake invocation using the caches handles most details automatically.

Following those steps should get you a BOLT-optimized clang with not too much work or even a PGO+ThinLTO+BOLT optimized clang.

zamazan4ik commented 1 year ago

@boomanaiden154 Thank you for the link! I think one thing is missing on the mentioned by you page - the performance benefits from BOLT on Clang. So right now isn't clear why I should try to optimize Clang with BOLT. I think would be great to add some performance numbers directly to this guide (like it's already done for PGO instructions with Clang).

If you haven't actual numbers from your own measurements, I guess they could be taken from these slides. In this case, for the users/maintainers will be more motivation to use during the Clang build process.

boomanaiden154 commented 1 year ago

That's a good point. I'd probably prefer to do fresh measurements since they are liable to change and I'd want to get them on the exact configuration under consideration since things like the specific tasks used for performance training, instrumentation vs sampling for profile collection, and specific versions of the code can make fairly big differences.

It shouldn't take too much effort to generate some performance numbers, just a little bit of time to go through all the benchmarking.

ptr1337 commented 1 year ago

I have created a while ago some results with a static llvm/clang build here: https://github.com/ptr1337/llvm-bolt-scripts/blob/master/results.md

They could vary a bit, since they were tested against llvm 15 and with instrumentation. The performance benefit with LBR should be bigger.

Also a little note: Bolting clang with a shared build (like archlinux and other distributions does provide it) does not make much sense, since the "libLLVM.so" needs to be optimized in shared builds. I also talked with aapuov a bit and he said, that optimizing shared builds makes "not much sense" since they don't focus on performance.

zamazan4ik commented 1 year ago

More results on BOLTing Clang from Android project - link.

ptr1337 commented 1 year ago

Here some new results on a ZEN 4 7950X3D: Clang 16 ThinLTO + PGO:

real    3m1.740s
user    87m25.048s
sys    5m31.901s

Clang 16 ThinLTO + PGO + BOLT

real    3m13.146s
user    92m23.847s
sys    5m37.822s

So ca. 7.5 % improvement. The time is the build of the projects clang and llvm.