Open michaelwoerister opened 6 years ago
For your reference, Git uses their integration tests as a source of PGO.
Missing slash at the end in the link to cygprofile (should be https://cs.chromium.org/chromium/src/tools/cygprofile/
) without it I get an error.
@ishitatsuyuki Interesting!
As an alternative to the google tool, there is BOLT by facebook (github link).
Great find, @est31!
(This was originally typed in response to https://github.com/rust-lang/rust/issues/55137 which has been closed as a duplicate of this issue)
I think the blocker historically for BOLT/PGO/LTO has been finding CI time, especially in the case of BOLT and PGO for gathering profile data. I think if the answer to "Can BOLT be run on a different binary from which we've gathered data for? (e.g., stage1/bin compiler is profiled while building stage2/bin compiler and then stage2/bin compiler is optimized?" is yes -- and there's still benefit from this -- then my next question is "how long does BOLT take?"
If someone would be willing to do the research to answer these questions then I think integrating this into CI would become more feasible. One good thing is that we can likely not worry about implementing this for all platforms at once since AFAICT BOLT is "just" an optimization
@Mark-Simulacrum I don't think this necessarily needs to involve CI at all. I envision these tools as useful for the artifacts that we distribute to users, rather than as an aid to rustc developers. Seems like it could just be the final step on the build servers while we're doing releases.
Well, our CI is Rust's build server, so in that regard that's why time especially is important.
I tried BOLT with my own build, and it performed 3% better on average. This was a rough benchmark since I'm using my laptop though, so it might be just noise. (I'm probably not going to run this again until I get a workstation.)
BOLT has some caveats:
As for gathering data, maybe running them on rustc-perf is another option? We can make use of its perf
support.
Sorry for necroposting, but there is another alternative to BOLT - https://github.com/google/llvm-propeller Maybe it will be better for rustc than BOLT. I didn't try it (yet).
But from my point of view Bolt is much more interesting way since BOLT is going to become a LLVM part - BOLT team now is working on it.
BOLT is on the verge of being upstreamed into LLVM: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153551.html
BOLT has landed in LLVM: https://github.com/llvm/llvm-project/commit/4c106cfdf7cf7eec861ad3983a3dd9a9e8f3a8ae
In the face of previous comment: are there any activities regarding rustc's BOLT support, currently? Is it available in nightly builds already? (I anticipate "no" here, so appreciate if someone could mention other relatively straightforward way to test it).
Thanks!
I have been trying to make it work for the past several months. Currently it doesn't seem to work however, since LLVM instrumented with BOLT segfaults.
@Kobzol do you have a related issue to this crash? E.g. any from these: https://github.com/llvm/llvm-project/issues?q=is%3Aissue+is%3Aopen+bolt
I am also interested in BOLTing the rustc :)
You can check https://github.com/rust-lang/rust/pull/94381 for more details, there are some related LLVM issues linked.
I saw this made it into the rust CI via a shell script? Could it be made a little more friendly as a Cargo enhancement - or at least docs with a simple Rust crate?
I created https://github.com/Kobzol/cargo-pgo for this.
It looks like https://github.com/rust-lang/rust/pull/94381 has been merged - @kobzol can we close this issue? :)
Well, BOLT is currently used only for optimizing LLVM on x64 Linux. It's not used to optimize rustc yet (there's an open PR, but the perf. gains bave been a bit lackluster).
The order in which code is located in binaries has an influence on how fast the binary executes because (as I understand it) it affects instruction cache locality and how efficiently the code is paged in from disk. Many linkers support specifying this order (e.g. LLD via
--symbol-ordering-file
and MSVC via-ORDER
). The hard part, though, is to find an order that will actually improve things. The chromium project has a tool for thisand somewhere else I've read that valgrind could be used for this too. The expected speedups are a few percent.Prerequisites:
rustc
(if using the chromium tool) similar to what GCC's-finstrument-functions
does.The first point shouldn't be too hard. The rest, however, would big a big infrastructure investment. I hope that we'll get PGO support for our CI at some point. This symbol ordering business could then be part of that.
cc @glandium @rust-lang/wg-compiler-performance @rust-lang/infra