Improve results on the rustc-perf benchmark suite

bjorn3 commented 4 years ago

I only ran the debug benchmarks, as check should be identical and release will definitively be faster because of much less optimizations by cg_clif.

Except for some stress-tests the clean and baseline incremental results are quite positive (~10-60% improvement, often ~40%) For clean incremental the results are much worse (easily ~200%), as compiled object files are not stored in the incremental cache (#760) For patched incremental the results are very mixed. Sometimes the difference is just a little bit less than clean incremental, while in other cases it is up to ~70% faster than cg_llvm.

packed-simd failed due to a verifier error. Edit(2020-03-11): Opened #919. ~~hyper-2 failed due to unsized locals not being implemented (used for impl FnOnce for Box<FnOnce>).~~ Edit(2020-03-11): Fixed in #916. style-servo failed due to running out of disk space.

Patch for rustc-perf

```diff diff --git a/collector/src/bin/rustc-perf-collector/execute.rs b/collector/src/bin/rustc-perf-collector/execute.rs index 9aa2cc48..4f577183 100644 --- a/collector/src/bin/rustc-perf-collector/execute.rs +++ b/collector/src/bin/rustc-perf-collector/execute.rs @@ -203,13 +203,19 @@ impl<'a> CargoProcess<'a> { fn run_rustc(&mut self) -> anyhow::Result<()> { loop { let mut cmd = self.base_command(self.cwd, "rustc"); + cmd.env("RUSTFLAGS", "-Cpanic=abort \ + -Zcodegen-backend=~/Documents/cg_clif/target/release/librustc_codegen_cranelift.so \ + --sysroot ~/Documents/cg_clif/build_sysroot/sysroot"); + cmd.arg("--target").arg("x86_64-unknown-linux-gnu"); cmd.arg("-p").arg(self.get_pkgid(self.cwd)); match self.build_kind { BuildKind::Check => { + return Ok(()); cmd.arg("--profile").arg("check"); } BuildKind::Debug => {} BuildKind::Opt => { + return Ok(()); cmd.arg("--release"); } } ```

Results

![image](https://user-images.githubusercontent.com/17426603/73138656-9db54a80-4065-11ea-8e1f-efc5f06f9def.png)

bjorn3 commented 4 years ago

Results after #918:

There are still regressions compared to cg_llvm, but most of the incremental compilation times have improved compared to cg_llvm.

Results

![image](https://user-images.githubusercontent.com/17426603/76462316-c315d200-63e1-11ea-8f0a-9e71595cb7d9.png)

bjorn3 commented 4 years ago

New patch for the collector:

diff --git a/collector/src/bin/rustc-perf-collector/execute.rs b/collector/src/bin/rustc-perf-collector/execute.rs
index 9aa2cc48..1caecc8c 100644
--- a/collector/src/bin/rustc-perf-collector/execute.rs
+++ b/collector/src/bin/rustc-perf-collector/execute.rs
@@ -203,13 +203,20 @@ impl<'a> CargoProcess<'a> {
     fn run_rustc(&mut self) -> anyhow::Result<()> {
         loop {
             let mut cmd = self.base_command(self.cwd, "rustc");
+            cmd.env("RUSTFLAGS", "-Cpanic=abort \
+            -Zcodegen-backend=/home/bjorn/Documenten/cg_clif/target/release/librustc_codegen_cranelift.so \
+            --sysroot /home/bjorn/Documenten/cg_clif/build_sysroot/sysroot");
+            cmd.arg("--target").arg("x86_64-unknown-linux-gnu");
             cmd.arg("-p").arg(self.get_pkgid(self.cwd));
+            cmd.env("CG_CLIF_INCR_CACHE", "1");
             match self.build_kind {
                 BuildKind::Check => {
+                    return Ok(());
                     cmd.arg("--profile").arg("check");
                 }
                 BuildKind::Debug => {}
                 BuildKind::Opt => {
+                    return Ok(());
                     cmd.arg("--release");
                 }
             }

Edit: Flipped the default of incr caching of object files in 198037119520d8cccafdc1fd511164c63d741aed, so the old patch is correct again.

bjorn3 commented 4 years ago

A lot of the reds are caused by the linker taking much more time. (Up to 90%!)

bjorn3 commented 4 years ago

5d516f9e118d6527947ca5deb3d76bbc4fa0f8a1 is a 20%-50% improvement on the coercions-debug benchmark. Overall it is a ~2% improvement.

bjorn3 commented 4 years ago

Current results with lld:

Results

![image](https://user-images.githubusercontent.com/17426603/76689102-fb601f00-6632-11ea-880d-2b6f1832280d.png)

Patch for rustc-perf

```diff diff --git a/collector/src/bin/rustc-perf-collector/execute.rs b/collector/src/bin/rustc-perf-collector/execute.rs index 9aa2cc48..9787da13 100644 --- a/collector/src/bin/rustc-perf-collector/execute.rs +++ b/collector/src/bin/rustc-perf-collector/execute.rs @@ -203,13 +203,21 @@ impl<'a> CargoProcess<'a> { fn run_rustc(&mut self) -> anyhow::Result<()> { loop { let mut cmd = self.base_command(self.cwd, "rustc"); + cmd.env("RUSTFLAGS", "-Cpanic=abort \ + -Clink-args=-fuse-ld=lld -Zcodegen-backend=/home/bjorn/Documenten/cg_clif/target/release/librustc_codegen_cranelift.so \ + --sysroot /home/bjorn/Documenten/cg_clif/build_sysroot/sysroot"); + //cmd.env("RUSTFLAGS", "-Cpanic=abort -Clink-args=-fuse-ld=lld"); + cmd.arg("--target").arg("x86_64-unknown-linux-gnu"); cmd.arg("-p").arg(self.get_pkgid(self.cwd)); + cmd.env("CG_CLIF_INCR_CACHE", "1"); match self.build_kind { BuildKind::Check => { + return Ok(()); cmd.arg("--profile").arg("check"); } BuildKind::Debug => {} BuildKind::Opt => { + return Ok(()); cmd.arg("--release"); } } ```

vultix commented 4 years ago

Although there are still regressions, they are almost entirely found in the tiny stress-test benchmarks. Most real-world benchmarks are seeing fantastic improvements!

Wonderful work, @bjorn3!

bjorn3 commented 4 years ago

There are a few places where a non stress-test benchmark regresses a few percent in one of the incremental benchmarks. Other than that many stress-test benchmarks regress because of slower linking. Improving this will benefit all other executable benchmarks too. For example the helloworld-debug regression can be completely explained by longer linking times. In fact the codegen part is faster for cg_clif.

bjorn3 commented 4 years ago

Reran the benchmarks with firefox and vscode closed. Now only regression-31157-debug patched incremental is a significant regression:

vultix commented 4 years ago

With such huge improvements, how much work would you say is left for MVP?

bjorn3 commented 4 years ago

There are still missing features as mentioned in https://hackmd.io/@bjorn3/HJL5ryFS8. I don't know how long it will take to implement most of them. Some are hard, while others are less hard.

NotAFile commented 2 years ago

Are there any recent rustc-perf runs? I'm especially curious about the JIT mode.

bjorn3 commented 2 years ago

Not recently. Don't expect the JIT mode to be faster than AOT compilation. The JIT mode currently doesn't support incremental compilation, which makes it slower.

jasonwilliams commented 2 years ago

Here is the latest.. Using commit df7f02072b64712e5322ea70675135cb1e20bf80

localhost_2346_compare html_start=LLVM end=CG_CLIF stat=wall-time

CG_CLIF

diff --git a/collector/src/execute.rs b/collector/src/execute.rs
index d816eaaf..ec71984f 100644
--- a/collector/src/execute.rs
+++ b/collector/src/execute.rs
@@ -399,14 +399,21 @@ impl<'a> CargoProcess<'a> {
                 };

             let mut cmd = self.base_command(self.cwd, subcommand);
+            cmd.env(
+                "RUSTFLAGS",
+                "-Zcodegen-backend=/home/jasew/workspace/rustc_codegen_cranelift/build/lib/librustc_codegen_cranelift.so",
+            );
+            cmd.arg("--target").arg("x86_64-unknown-linux-gnu");
             cmd.arg("-p").arg(self.get_pkgid(self.cwd)?);
             match self.profile_kind {
                 ProfileKind::Check => {
+                    return Ok(());
                     cmd.arg("--profile").arg("check");
                 }
                 ProfileKind::Debug => {}
                 ProfileKind::Doc => {}
                 ProfileKind::Opt => {
+                    return Ok(());
                     cmd.arg("--release");
                 }
             }

LLVM

diff --git a/collector/src/execute.rs b/collector/src/execute.rs
index d816eaaf..ca34d0a3 100644
--- a/collector/src/execute.rs
+++ b/collector/src/execute.rs
@@ -399,14 +399,17 @@ impl<'a> CargoProcess<'a> {
                 };

             let mut cmd = self.base_command(self.cwd, subcommand);
+            cmd.arg("-j1");
             cmd.arg("-p").arg(self.get_pkgid(self.cwd)?);
             match self.profile_kind {
                 ProfileKind::Check => {
+                    return Ok(());
                     cmd.arg("--profile").arg("check");
                 }
                 ProfileKind::Debug => {}
                 ProfileKind::Doc => {}
                 ProfileKind::Opt => {
+                    return Ok(());
                     cmd.arg("--release");
                 }
             }

Notes:

I needed perf, https://gist.github.com/abel0b/b1881e41b9e1c4b16d84e5e083c38a13 worked fine
rust-perf https://github.com/rust-lang/rustc-perf

Processor AMD Ryzen 9 5950X 16-Core Processor 3.40 GHz Installed RAM 32.0 GB

bjorn3 commented 2 years ago

cc https://github.com/bjorn3/rustc_codegen_cranelift/pull/1271

rust-lang / rustc_codegen_cranelift

Improve results on the rustc-perf benchmark suite #878