rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.72k stars 12.5k forks source link

Add Context-Sensitive IR PGO (CSIR PGO) #118562

Open zamazan4ik opened 9 months ago

zamazan4ik commented 9 months ago

Clang has support for an additional PGO mode - Context-Sensitive PGO. This is completely the same way as now used by Rustc but the main difference is when instrumentation is done. In usual (IR PGO) instrumentation is done before the inlining phase, with CSIR PGO - after the inlining phase. This could be important since the inlining decisions can introduce some noise into the profiling information.

In https://github.com/llvm/llvm-project/issues/56274#issuecomment-1407117774 comment, there is a small insight into the actual effects on the performance. According to this, we can treat CSIR PGO as a light replacement for LLVM BOLT optimization. For some people, it could be important since LLVM BOLT does not work on all platforms.

Right now it's not clear, could we implement CSIR PGO by just passing the right LLVM flag, or some frontend changes are required too? I guess some CSIR PGO implementation in Clang is required here. In any way, CSIR PGO information should be added to the Rustc PGO documentation.

CSIR PGO also was mentioned in the initial PGO issue for Rustc: https://github.com/rust-lang/rust/issues/59913#issue-432504990

zamazan4ik commented 9 months ago

Helpful insights can be found here: https://reviews.llvm.org/D54176

Kobzol commented 9 months ago

I have already tried CSIR PGO several times for rustc's LLVM (https://github.com/rust-lang/rust/pull/97153, https://github.com/rust-lang/rust/pull/111806), but couldn't get any performance benefits out of it. When I discussed it with BOLT maintainers on Discord, they basically told me that CSIR's optimizations are subsumed by bOLT. And since we already use BOLT both for rustc and LLVM, I don't think that CSIR will help much.

zamazan4ik commented 9 months ago

I have seen this discussion. However, I disagree that CSIR PGO has no sense because we have LLVM BOLT. BOLT has many limitations (lack of support for multiple platforms important platforms like Windows, macOS and *BSD, resource consumption during the optimization phase, multiple bugs). Since all of these, there are a lot of applications that simply cannot use BOLT in their optimization pipelines.

Even if for the Rustc itself could be less sense to use CSIR PGO in its optimization pipelines (however CSIR PGO should be useful for Windows and macOS builds at least), CSIR PGO support could be useful for other Rust applications.

Regarding the actual benchmark improvements from PGO - I already requested some numbers from Google people. Hopefully, they will be able to share them. If I get enough time, probably I can perform my own benchmarks too.

Kobzol commented 9 months ago

Yeah I don't dispute that, I just wanted to mention that for x64 Linux rustc/LLVM it's probably not worth investing effort into. It could be useful for general Rust programs or rustc on other platforms (but there it's quite problematic, since CI time is typically quite limited and we don't even have benchmarking for non-Linux systems currently).

Jules-Bertholet commented 9 months ago

@rustbot label T-compiler C-optimization C-enhancement A-LLVM