rust-lang / compiler-team

A home for compiler team planning documents, meeting minutes, and other such things.
https://rust-lang.github.io/compiler-team/
Apache License 2.0
388 stars 69 forks source link

Enable Parallel Rustc Front End In Nightly Builds #681

Closed SparrowLii closed 1 year ago

SparrowLii commented 1 year ago

Proposal

It has been almost a year since the parallel rustc working group was rebooted. We've made a lot of progress this year and I think it is time to start a public test and promote by enabling the parallel front end in nightly versions.

Please let me introduce the current progress and status of this feature, as well as our shipping strategy.

Current status

Multiple Threads Performance

Under 8 cores and 8 threads, the parallel front end can reduce the clean build (cargo build with -Z threads=8 option) time by about 30% on average. (These crates are from compiler-benchmarks of rustc-perf)

crate 8cores-master-stage1 8cores-8threads-stage1
serde 4.86 2.57 -47.12%
stm32f4 18.89 10.97 -41.93%
diesel 17.13 10.3 -39.87%
syn 3.81 2.38 -37.53%
hyper 11.5 7.41 -35.57%
regex 4.15 2.77 -33.25%
clap 4.91 3.3 -32.79%
unicode-normalization 1.9 1.3 -31.58%
serde_derive 7.11 4.94 -30.52%
cranlift-codegen 13.57 9.47 -30.21%
core 21.89 15.37 -29.79%
html5ever 10.12 7.4 -26.88%
webrender 40.60 29.85 -26.48%
image 16.94 12.99 -23.32%
cargo 53.57 44.13 -17.62%
bitmaps 1.03 0.85 -17.48%
libc 1.14 0.95 -16.67%
exa 9.31 8.12 -12.78%
ripgrep 15.88 14.09 -11.27%

Here are the test results from rustc-perf. With 8 threads on the 6-core machine, the wall time on the full and incr-full scenarios was reduced by 23.29%. There is a 1.87% regression for the incr-unchanged and incr-patched scenarios, but this should have little impact on users.

Single Thread Performance

In the past few years, the regression under single thread has always been the blocking of parallel front end. This has also been our focus over the past year. According to the results of rustc-perf, the current regression of parallel front end under single-threaded has been reduced to an average of 1.8%, which is generally an acceptable result.

Crater Test

A total of 7 regressions occurred in the previous crater test, 5 of which the cause is temporarily unknown, but none of them have been reproduced in the local test. Enabling the parallel front end in the nightly version may help us find more clues.

UI Tests

All UI test failures caused by enabling parallel front end are fixed. Note that this is only when single-threading is used by default. Diagnostics under multi-threading maybe different with under single-threading, because of order of errors and fatals maybe different.

Existing Issues

Under multi-threading, compilation will still fail occasionally, such as these issues. This is why the parallel front end still needs to under the -Z option. We will gradually address these issues to stabilize this feature.

Future Improvements

@Zoxc has a PR that can improve performance by about 10% on the current basis, but there are still some issues related to rayon that need to be resolved.

On the other hand, brutally increasing the number of threads and cores (I tried a 48-core machine with 32 threads) did not increase the performance of the parallel front end. This may have a lot to do with the complex dependencies between queries, which leads to frequent data contention that limits performance. In the next year, I will focus on improving the current incremental compilation system, and in the process find the key factors and solutions about parallel front end.

Strategy

For the community and rustc developers, the method to enable the parallel front end is very simple, that is, set the rustc_parallel option to true in the bootstrap config, then rebuild rustc. Otherwise, it will be turned off. But for general rust developers, it is difficult to switch between serial/parallel rustc.

So here is the plan:

  1. Enable the parallel front end in nightly builds, and keep the default number of threads as 1. Then users can use the parallel rustc front end via -Z threads=n option(The recommended value is 8).

  2. Set it up to serial front end for beta/stable builds via bootstrap.

  3. Switch over the alt builders from parallel rustc to serial, so we have artifacts without parallel to test against the artifacts with parallel.

  4. Observe and get user feedback. Including:

    (1) When the number of threads is 1, will the parallel front end cause bugs/regressions?

    (2) What will the experience be like with -Z threads=n option?

  5. Fix the known issues/deadlocks with multiple threads, actively looking for possible new issues and fix them too.

  6. When we ensure the stablization of parallel front end , enable and stablize it on beta/stable builds.

Mentors or Reviewers

@oli-obk @cjgillot @nnethercote @bjorn3 @Kobzol

Process

The main points of the Major Change Process are as follows:

You can read more about Major Change Proposals on forge.

Comments

This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.

rustbot commented 1 year ago

This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.

cc @rust-lang/compiler @rust-lang/compiler-contributors

oli-obk commented 1 year ago

@rustbot second

FilipAndersson245 commented 1 year ago

So hyped for this to land

ghost commented 1 year ago

May I ask if these runtime improvements come for free? Or it similar to the SIMD performance improvements where we get 2 times better runtime by paying 2 times more electricity cost?

bjorn3 commented 1 year ago

Multi-threading will use more cpu cores, so more power during compilation. The total energy usage for the entire compilation may not be all that much higher tough and if it allows the cpu to go into a deep sleep mode ("race to sleep", the core principle behind turbo-boost), taking the energy usage of a whole work day into account could theoretically result in a somewhat lower total energy usage depending on the cpu and how often you are compiling things.

In any case https://rust-lang.zulipchat.com/#narrow/stream/233931-xxx/topic/Enable.20Parallel.20Rustc.20Front.20End.20In.20Nightl.E2.80.A6.20compiler-team.23681 is a better place for discussion about the proposal itself.

apiraino commented 1 year ago

@rustbot label -final-comment-period +major-change-accepted