Open weihanglo opened 3 months ago
cc @Kobzol
Separate each stage and be able to start from any stage
We need to remember here that the profiles are not small and very numerous. This could quickly eat a lot of available disk space on some builders. Some of them already don't use separate profile files for space reasons. This could likely need to be opt-in, as the incremental opt-dist use-case is less suited for CI than local work.
Hi, this is a good idea, although as so very often, performance considerations are in the way of clean software design :) Ideally, we'd have each stage as a separate function and just expose the stages so that they can be mixed & matched by users as they wish. However, what we have currently is that all stages are super tightly integrated, with very careful ordering and bootstrap
argument selection that was hand-crafted so that the build takes as little time as possible (to be clear: as little time as possible if you want to run all the steps in one go).
I suppose that we could introduce separate commands for the individual stages, but I think that we'll need to duplicate some of the stage building code in main.rs
, I'm not sure if we can make the current super tightly integrated execute_pipeline
composable. But it might be worth a try :)
We need to remember here that the profiles are not small and very numerous. This could quickly eat a lot of available disk space on some builders. Some of them already don't use separate profile files for space reasons. This could likely need to be opt-in, as the incremental opt-dist use-case is less suited for CI than local work.
After postprocessing, each profile is actually just a single file that has like 100 MiB, so this shouldn't be a concern I think. (also, I don't want to separate the stages on CI :smile:)
Problem
The
opt-dist
tool was created for building an PGO/BOLT optimized compiler for the official Rust toolchain distribution. It wasn't designed for other downstream packagers to build their own optimized compilers.With some recent efforts, the source tarball (rustc-src tarball) will contain necessary vendored sources for downstream package to reproduce the optimized build the project has done in the distribution process.
However, the opt-dist pipeline build is notoriously time-consuming. There are current five stages (4 actually? I cannot find the 4th) in the pipeline. On my environment it took so long to run through the entire pipeline. For example,
If a build failed in the middle, and the environment has no good cache layer or has a randomized root directory for each build, a new build will start over from stage 1. That also slowdown the development feedback loop when working on the
opt-dist
tool itself.Proposed solution
It would be cool if we can separate each stage and be able to start from any stage if the necessary input of the next stage is prepared. That makes it easier to recover from failures, no longer needed to start over from the beginning of the pipeline.
To start small, we could first separate the final dist build from other profiling stages for the
opt-dist
. Theopt-dist
accepts trailing arguments and will be passed tobootstrap
. The wrong arguments won't be detected until the final bootstrap dist-build starts.Possible implementation
To separate dist-build and profiling stages, we will need to find a way to preserve profile data. We could have different directories for different profile data, such as:
Each directory contains corresponding profiles. The names of those profiles can follow some scheme like
rustc-pgo-<version>-<some-metadata-rustc-use>
.For CLI option, we could have new subcommands: