When enabling optimizations for cg_clif there is a non-trivial runtime perf improvement at a very small compile time cost. This is probably due to a combination of MIR inlining and Cranelift's e-graph based optimizations.
On AArch64 I'm getting a ~30% improvement over optimizations disabled for cg_clif:
```
Benchmark 1: ./raytracer_cg_llvm
Time (mean ± σ): 7.049 s ± 0.022 s [User: 7.041 s, System: 0.008 s]
Range (min … max): 7.021 s … 7.097 s 10 runs
Benchmark 2: ./raytracer_cg_clif
Time (mean ± σ): 4.923 s ± 0.006 s [User: 4.917 s, System: 0.006 s]
Range (min … max): 4.913 s … 4.931 s 10 runs
Benchmark 3: ./raytracer_cg_clif_opt
Time (mean ± σ): 3.780 s ± 0.011 s [User: 3.775 s, System: 0.006 s]
Range (min … max): 3.770 s … 3.810 s 10 runs
Summary
'./raytracer_cg_clif_opt' ran
1.30 ± 0.00 times faster than './raytracer_cg_clif'
1.86 ± 0.01 times faster than './raytracer_cg_llvm'
```
at a small compile time cost that almost entirely vanishes due to the huge amount of available parallelism on the AArch64 machine:
```
Benchmark 1: RUSTC=rustc cargo build [...]
Time (mean ± σ): 11.398 s ± 0.127 s [User: 37.814 s, System: 5.434 s]
Range (min … max): 11.150 s … 11.579 s 10 runs
Benchmark 2: RUSTC=rustc /home/gh-bjorn3/cg_clif/./dist/cargo-clif build [...]
Time (mean ± σ): 9.758 s ± 0.124 s [User: 24.436 s, System: 5.305 s]
Range (min … max): 9.588 s … 10.033 s 10 runs
Benchmark 3: RUSTC=rustc /home/gh-bjorn3/cg_clif/./dist/cargo-clif build --release [...]
Time (mean ± σ): 9.741 s ± 0.212 s [User: 26.540 s, System: 5.244 s]
Range (min … max): 9.564 s … 10.314 s 10 runs
Summary
'RUSTC=rustc /home/gh-bjorn3/cg_clif/./dist/cargo-clif build --release [...]' ran
1.00 ± 0.03 times faster than 'RUSTC=rustc /home/gh-bjorn3/cg_clif/./dist/cargo-clif build [...]
1.17 ± 0.03 times faster than 'RUSTC=rustc cargo build [...]'
```
On github actions (x86_64) which has much less available parallelism I get somewhat smaller but still really nice perf improvement of ~20%:
```
Benchmark 1: ./raytracer_cg_llvm
Time (mean ± σ): 4.294 s ± 0.053 s [User: 4.287 s, System: 0.005 s]
Range (min … max): 4.241 s … 4.382 s 10 runs
Benchmark 2: ./raytracer_cg_clif
Time (mean ± σ): 4.095 s ± 0.058 s [User: 4.089 s, System: 0.005 s]
Range (min … max): 4.019 s … 4.199 s 10 runs
Benchmark 3: ./raytracer_cg_clif_opt
Time (mean ± σ): 3.385 s ± 0.083 s [User: 3.380 s, System: 0.003 s]
Range (min … max): 3.323 s … 3.606 s 10 runs
Summary
'./raytracer_cg_clif_opt' ran
1.21 ± 0.03 times faster than './raytracer_cg_clif'
1.27 ± 0.03 times faster than './raytracer_cg_llvm'
```
At the cost of ~6% slower compilation:
```
Benchmark 1: RUSTC=rustc cargo build [...]
Time (mean ± σ): 14.992 s ± 0.143 s [User: 23.646 s, System: 3.589 s]
Range (min … max): 14.815 s … 15.209 s 10 runs
Benchmark 2: RUSTC=rustc /home/runner/work/rustc_codegen_cranelift/rustc_codegen_cranelift/./dist/cargo-clif build [...]
Time (mean ± σ): 10.955 s ± 0.054 s [User: 16.081 s, System: 3.422 s]
Range (min … max): 10.889 s … 11.044 s 10 runs
Benchmark 3: RUSTC=rustc /home/runner/work/rustc_codegen_cranelift/rustc_codegen_cranelift/./dist/cargo-clif build --release [...]
Time (mean ± σ): 11.632 s ± 0.117 s [User: 17.578 s, System: 3.311 s]
Range (min … max): 11.468 s … 11.883 s 10 runs
Summary
'RUSTC=rustc /home/runner/work/rustc_codegen_cranelift/rustc_codegen_cranelift/./dist/cargo-clif build [...]' ran
[BENCH RUN] ebobby/simple-raytracer
1.06 ± 0.01 times faster than 'RUSTC=rustc /home/runner/work/rustc_codegen_cranelift/rustc_codegen_cranelift/./dist/cargo-clif build --release [...]'
1.37 ± 0.01 times faster than 'RUSTC=rustc cargo build [...]'
```
Be aware however that this is for a single benchmark which is not all that representative of real life performance. Make sure to benchmark yourself on your workload. I did love to hear what the results would be for you.
When enabling optimizations for cg_clif there is a non-trivial runtime perf improvement at a very small compile time cost. This is probably due to a combination of MIR inlining and Cranelift's e-graph based optimizations.
On AArch64 I'm getting a ~30% improvement over optimizations disabled for cg_clif:
at a small compile time cost that almost entirely vanishes due to the huge amount of available parallelism on the AArch64 machine:
On github actions (x86_64) which has much less available parallelism I get somewhat smaller but still really nice perf improvement of ~20%:
At the cost of ~6% slower compilation:
Be aware however that this is for a single benchmark which is not all that representative of real life performance. Make sure to benchmark yourself on your workload. I did love to hear what the results would be for you.