rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.38k stars 12.72k forks source link

Use section/symbol ordering files for compiling rustc (e.g. BOLT) #50655

Open michaelwoerister opened 6 years ago

michaelwoerister commented 6 years ago

The order in which code is located in binaries has an influence on how fast the binary executes because (as I understand it) it affects instruction cache locality and how efficiently the code is paged in from disk. Many linkers support specifying this order (e.g. LLD via --symbol-ordering-file and MSVC via -ORDER). The hard part, though, is to find an order that will actually improve things. The chromium project has a tool for thisand somewhere else I've read that valgrind could be used for this too. The expected speedups are a few percent.

Prerequisites:

The first point shouldn't be too hard. The rest, however, would big a big infrastructure investment. I hope that we'll get PGO support for our CI at some point. This symbol ordering business could then be part of that.

cc @glandium @rust-lang/wg-compiler-performance @rust-lang/infra

ishitatsuyuki commented 6 years ago

For your reference, Git uses their integration tests as a source of PGO.

bjorn3 commented 6 years ago

Missing slash at the end in the link to cygprofile (should be https://cs.chromium.org/chromium/src/tools/cygprofile/) without it I get an error.

michaelwoerister commented 6 years ago

@ishitatsuyuki Interesting!

est31 commented 6 years ago

As an alternative to the google tool, there is BOLT by facebook (github link).

michaelwoerister commented 6 years ago

Great find, @est31!

Mark-Simulacrum commented 6 years ago

(This was originally typed in response to https://github.com/rust-lang/rust/issues/55137 which has been closed as a duplicate of this issue)

I think the blocker historically for BOLT/PGO/LTO has been finding CI time, especially in the case of BOLT and PGO for gathering profile data. I think if the answer to "Can BOLT be run on a different binary from which we've gathered data for? (e.g., stage1/bin compiler is profiled while building stage2/bin compiler and then stage2/bin compiler is optimized?" is yes -- and there's still benefit from this -- then my next question is "how long does BOLT take?"

If someone would be willing to do the research to answer these questions then I think integrating this into CI would become more feasible. One good thing is that we can likely not worry about implementing this for all platforms at once since AFAICT BOLT is "just" an optimization

bstrie commented 6 years ago

@Mark-Simulacrum I don't think this necessarily needs to involve CI at all. I envision these tools as useful for the artifacts that we distribute to users, rather than as an aid to rustc developers. Seems like it could just be the final step on the build servers while we're doing releases.

Mark-Simulacrum commented 6 years ago

Well, our CI is Rust's build server, so in that regard that's why time especially is important.

ishitatsuyuki commented 5 years ago

I tried BOLT with my own build, and it performed 3% better on average. This was a rough benchmark since I'm using my laptop though, so it might be just noise. (I'm probably not going to run this again until I get a workstation.)

BOLT has some caveats:

As for gathering data, maybe running them on rustc-perf is another option? We can make use of its perf support.

zamazan4ik commented 3 years ago

Sorry for necroposting, but there is another alternative to BOLT - https://github.com/google/llvm-propeller Maybe it will be better for rustc than BOLT. I didn't try it (yet).

zamazan4ik commented 3 years ago

But from my point of view Bolt is much more interesting way since BOLT is going to become a LLVM part - BOLT team now is working on it.

bstrie commented 3 years ago

BOLT is on the verge of being upstreamed into LLVM: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153551.html

bstrie commented 2 years ago

BOLT has landed in LLVM: https://github.com/llvm/llvm-project/commit/4c106cfdf7cf7eec861ad3983a3dd9a9e8f3a8ae

ink-splatters commented 2 years ago

In the face of previous comment: are there any activities regarding rustc's BOLT support, currently? Is it available in nightly builds already? (I anticipate "no" here, so appreciate if someone could mention other relatively straightforward way to test it).

Thanks!

Kobzol commented 2 years ago

I have been trying to make it work for the past several months. Currently it doesn't seem to work however, since LLVM instrumented with BOLT segfaults.

zamazan4ik commented 2 years ago

@Kobzol do you have a related issue to this crash? E.g. any from these: https://github.com/llvm/llvm-project/issues?q=is%3Aissue+is%3Aopen+bolt

I am also interested in BOLTing the rustc :)

Kobzol commented 2 years ago

You can check https://github.com/rust-lang/rust/pull/94381 for more details, there are some related LLVM issues linked.

chadbrewbaker commented 2 years ago

I saw this made it into the rust CI via a shell script? Could it be made a little more friendly as a Cargo enhancement - or at least docs with a simple Rust crate?

Kobzol commented 2 years ago

I created https://github.com/Kobzol/cargo-pgo for this.

jyn514 commented 1 year ago

It looks like https://github.com/rust-lang/rust/pull/94381 has been merged - @kobzol can we close this issue? :)

Kobzol commented 1 year ago

Well, BOLT is currently used only for optimizing LLVM on x64 Linux. It's not used to optimize rustc yet (there's an open PR, but the perf. gains bave been a bit lackluster).