rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.09k stars 12.69k forks source link

opt-dist: separate dist-build from profiling #127259

Open weihanglo opened 3 months ago

weihanglo commented 3 months ago

Problem

The opt-dist tool was created for building an PGO/BOLT optimized compiler for the official Rust toolchain distribution. It wasn't designed for other downstream packagers to build their own optimized compilers.

With some recent efforts, the source tarball (rustc-src tarball) will contain necessary vendored sources for downstream package to reproduce the optimized build the project has done in the distribution process.

However, the opt-dist pipeline build is notoriously time-consuming. There are current five stages (4 actually? I cannot find the 4th) in the pipeline. On my environment it took so long to run through the entire pipeline. For example,

-----------------------------------------------------------------
Stage 1 (Rustc PGO):                            2544.37s (25.42%)
  Build PGO instrumented rustc and LLVM:        1603.55s (16.02%)
  Gather profiles:                               715.07s ( 7.14%)
  Build PGO optimized rustc:                     225.74s ( 2.26%)
Stage 2 (LLVM PGO):                             1655.85s (16.54%)
  Build PGO instrumented LLVM:                  1139.50s (11.39%)
  Gather profiles:                               514.86s ( 5.14%)
Stage 3 (BOLT):                                 4704.19s (47.00%)
  Build PGO optimized LLVM:                     1498.12s (14.97%)
  Gather profiles:                               700.52s ( 7.00%)
  Gather profiles:                              1217.13s (12.16%)
Stage 5 (final build):                           856.55s ( 8.56%)
Run tests:                                       247.73s ( 2.48%)

Total duration:                                        2h 46m 48s
----------------------------------------------------------------- 

If a build failed in the middle, and the environment has no good cache layer or has a randomized root directory for each build, a new build will start over from stage 1. That also slowdown the development feedback loop when working on the opt-dist tool itself.

Proposed solution

It would be cool if we can separate each stage and be able to start from any stage if the necessary input of the next stage is prepared. That makes it easier to recover from failures, no longer needed to start over from the beginning of the pipeline.

To start small, we could first separate the final dist build from other profiling stages for the opt-dist. The opt-dist accepts trailing arguments and will be passed to bootstrap. The wrong arguments won't be detected until the final bootstrap dist-build starts.

Possible implementation

To separate dist-build and profiling stages, we will need to find a way to preserve profile data. We could have different directories for different profile data, such as:

./
├── bolt-profiles/
├── llvm-pgo-profiles/
└── rustc-pgo-profiles/

Each directory contains corresponding profiles. The names of those profiles can follow some scheme like rustc-pgo-<version>-<some-metadata-rustc-use>.

For CLI option, we could have new subcommands:

./opt-dist local profile --llvm-dir <path> # by default run all stages
./opt-dist local dist -- <build-args>

# In the future we could also separate each stage
./opt-dist local rustc-pgo --llvm-dir <path>
./opt-dist local llvm-pgo --rustc-pgo-profile <path>
./opt-dist local bolt --llvm-pgo-profile <path> --rustc-pgo-profile <path>
weihanglo commented 3 months ago

cc @Kobzol

lqd commented 3 months ago

Separate each stage and be able to start from any stage

We need to remember here that the profiles are not small and very numerous. This could quickly eat a lot of available disk space on some builders. Some of them already don't use separate profile files for space reasons. This could likely need to be opt-in, as the incremental opt-dist use-case is less suited for CI than local work.

Kobzol commented 3 months ago

Hi, this is a good idea, although as so very often, performance considerations are in the way of clean software design :) Ideally, we'd have each stage as a separate function and just expose the stages so that they can be mixed & matched by users as they wish. However, what we have currently is that all stages are super tightly integrated, with very careful ordering and bootstrap argument selection that was hand-crafted so that the build takes as little time as possible (to be clear: as little time as possible if you want to run all the steps in one go).

I suppose that we could introduce separate commands for the individual stages, but I think that we'll need to duplicate some of the stage building code in main.rs, I'm not sure if we can make the current super tightly integrated execute_pipeline composable. But it might be worth a try :)

Kobzol commented 3 months ago

We need to remember here that the profiles are not small and very numerous. This could quickly eat a lot of available disk space on some builders. Some of them already don't use separate profile files for space reasons. This could likely need to be opt-in, as the incremental opt-dist use-case is less suited for CI than local work.

After postprocessing, each profile is actually just a single file that has like 100 MiB, so this shouldn't be a concern I think. (also, I don't want to separate the stages on CI :smile:)