Open Tnze opened 3 years ago
And this takes 12 mins:
fn main() {
let s = [[[[[Option::<usize>::None; 4]; 9]; 9]; 12]; 5];
println!("{:?}", s);
}
untitled % cargo build --release
Compiling untitled v0.1.0 (/Users/Tnze/CLionProjects/untitled)
Finished release [optimized] target(s) in 12m 20s
So I ran a few numbers and it seems that LLVM is the culprit here. I'm doing a benchmark from 1.40.0 to 1.54 and print the time it passes in each stage and so far these are the numbers from 1.40.0:
[...]
time: 0.003; rss: 142MB -> 143MB ( +1MB) LLVM_module_optimize_module_passes(a.cc5d2364-cgu.1)
time: 0.008; rss: 163MB -> 141MB ( -22MB) LLVM_module_optimize_module_passes(a.cc5d2364-cgu.4)
time: 0.040; rss: 162MB -> 143MB ( -20MB) LLVM_module_optimize_module_passes(a.cc5d2364-cgu.2)
time: 0.043; rss: 177MB -> 143MB ( -34MB) LLVM_module_optimize_module_passes(a.cc5d2364-cgu.3)
time: 287.131; rss: 182MB -> 221MB ( +39MB) LLVM_module_optimize_module_passes(a.cc5d2364-cgu.5)
time: 0.003; rss: 223MB -> 210MB ( -13MB) LLVM_lto_optimize(a.cc5d2364-cgu.6)
time: 0.011; rss: 227MB -> 184MB ( -42MB) LLVM_lto_optimize(a.cc5d2364-cgu.0)
time: 0.015; rss: 223MB -> 187MB ( -36MB) LLVM_lto_optimize(a.cc5d2364-cgu.1)
time: 0.019; rss: 223MB -> 185MB ( -37MB) LLVM_lto_optimize(a.cc5d2364-cgu.4)
time: 0.027; rss: 223MB -> 179MB ( -44MB) LLVM_lto_optimize(a.cc5d2364-cgu.2)
time: 0.030; rss: 225MB -> 182MB ( -43MB) LLVM_lto_optimize(a.cc5d2364-cgu.3)
time: 174.925; rss: 182MB -> 199MB ( +17MB) LLVM_lto_optimize(a.cc5d2364-cgu.5)
[...]
471.27user 1.13system 7:52.22elapsed 100%CPU (0avgtext+0avgdata 975232maxresident)k
So 98% of the time is spent in LLVM_module_optimize_module_passes
and LLVM_lto_optimize
.
The numbers only get worse, but I'll post them as soon as it gets ready.
I think I'm doing a earlier run to see if there was a significant bump in the time it takes to compile the program.
Would be nice if somebody can tell if this is already known and a dup of something.
@rustbot modify labels: -I-slow I-compiletime A-llvm T-compiler
Somewhere between 1.23.0 and 1.24.0
(RUSTC_BOOTSTRAP
is needed for the -Z
option to enable)
RUSTC_BOOTSTRAP=1 rustc +1.23.0 -Z time-passes -O a.rs
-> 0.3sRUSTC_BOOTSTRAP=1 rustc +1.24.0 -Z time-passes -O a.rs
-> 233sFunny enough, they both run on the same LLVM Version:
marcel@ /t/tmp.ParxvQDiIQ> rustc +1.23.0 -vV
rustc 1.23.0 (766bd11c8 2018-01-01)
binary: rustc
commit-hash: 766bd11c8a3c019ca53febdcd77b2215379dd67d
commit-date: 2018-01-01
host: x86_64-unknown-linux-gnu
release: 1.23.0
LLVM version: 4.0
marcel@ /t/tmp.ParxvQDiIQ> rustc +1.24.0 -vV
rustc 1.24.0 (4d90ac38c 2018-02-12)
binary: rustc
commit-hash: 4d90ac38c0b61bb69470b61ea2cccea0df48d9e5
commit-date: 2018-02-12
host: x86_64-unknown-linux-gnu
release: 1.24.0
LLVM version: 4.0
After full unrolling, we basically first initialize the innermost array, then copy that to the next level, then copy that to the next level. Now SROA comes along, and breaks up the memcpys into individual stores, which allows them to be forwarded. We're now left with 20000 or so stores to initialize the array. Then InstCombine comes along and tries to drop the alloca by visiting all its uses. Probably many times if some other fold happens in between.
I'm new to rust. I don't understand what's InstCombine or SROA.
Is there anything I can do? Whether this is a bug?
If it is, could we do better or is it a optimization algorithm limitation and cannot be easily fixed?
I can suggest you a workaround:
fn main() {
const N: Option<usize>: None;
let s = [[[[[N; 4]; 9]; 9]; 12]; 5];
println!("{:?}", s);
}
this compiles in a few milliseconds.
I'm new to rust. I don't understand what's InstCombine or SROA.
Is there anything I can do? Whether this is a bug?
If it is, could we do better or is it a optimization algorithm limitation and cannot be easily fixed?
Whether this is a bug?
I think this is pretty clearly a bug somewhere, and unwanted behaviour. If it was not considered as such, someone would probably close this issue/ticket.
If it is, could we do better or is it a optimization algorithm limitation and cannot be easily fixed?
As hellow554 mentioned above, the time is being spent in LLVM, which is C++ code and not directly part of this repository. The rust compiler typically uses LLVM as the backend to generate and optimize it's output (Usually machine code). I would imagine fixing it is probably a matter of changing the heuristics or parameters/configuration LLVM is invoked with, but it is probably something to be careful with so as to not make existing code slower.
I think what was used to generate the table with the time spent in each place was the unstable -Ztime-passes
flag. I don't know if it has any documentation, but you can read a bit about it in this blog post https://blog.mozilla.org/nnethercote/2016/10/14/how-to-speed-up-the-rust-compiler/
Then nikic gave a bit of insight into where LLVM is spending that time.
I'm new to rust. I don't understand what's InstCombine or SROA.
InstCombine is one of LLVM's optimization passes. If you want, you can read a bit about it here https://llvm.org/docs/Passes.html#instcombine-combine-redundant-instructions
SROA stands for "Scalar Replacement Of Aggregates" and is a common optimization, and also an LLVM optimization pass. It is described here https://llvm.org/docs/Passes.html#sroa-scalar-replacement-of-aggregates
I feel this specific code snippet can/should be integrated with inline_const
feature after it's stablized, keeping major work on the rustc side instead of on llvm side.
cc #76001
I can suggest you a workaround:
fn main() { const N: Option<usize>: None; let s = [[[[[N; 4]; 9]; 9]; 12]; 5]; println!("{:?}", s); }
this compiles in a few milliseconds.
I'm sorry but this trick doesn't work anymore.
My current compiler version is:
rustc 1.71.0-nightly (7908a1d65 2023-04-17)
binary: rustc
commit-hash: 7908a1d65496b88626e4b7c193c81d777005d6f3
commit-date: 2023-04-17
host: x86_64-pc-windows-msvc
release: 1.71.0-nightly
LLVM version: 16.0.2
@Tnze for me on the latest nightly (2023-05-07) this compiles in 0.41s and the original code seems to have been fixed as well. It compiles in 0.36s.
@nikic should we add E-needs-test
for this and close this issue afterwards or do you want to do something else?
@Tnze for me on the latest nightly (2023-05-07) this compiles in 0.41s and the original code seems to have been fixed as well. It compiles in 0.36s.
@hellow554
I upgrade my nightly toolchain. It doesn't seem to be fixed. Did you forget --release
?
No, but I did check
instead of build
.
It's not fixed and you're right, I'm sorry.
Let me try to bisect this
********************************************************************************
Regression in nightly-2022-02-26
********************************************************************************
fetching https://static.rust-lang.org/dist/2022-02-25/channel-rust-nightly-git-commit-hash.txt
nightly manifest 2022-02-25: 40 B / 40 B [===========================] 100.00 % 808.75 KB/s converted 2022-02-25 to 4b043faba34ccc053a4d0110634c323f6c03765e
fetching https://static.rust-lang.org/dist/2022-02-26/channel-rust-nightly-git-commit-hash.txt
nightly manifest 2022-02-26: 40 B / 40 B [===========================] 100.00 % 775.05 KB/s converted 2022-02-26 to d3ad51b48f83329fac0cd8a9f1253f3146613c1c
looking for regression commit between 2022-02-25 and 2022-02-26
fetching (via remote github) commits from max(4b043faba34ccc053a4d0110634c323f6c03765e, 2022-02-23) to d3ad51b48f83329fac0cd8a9f1253f3146613c1c
ending github query because we found starting sha: 4b043faba34ccc053a4d0110634c323f6c03765e
get_commits_between returning commits, len: 11
commit[0] 2022-02-24: Auto merge of #94131 - Mark-Simulacrum:fmt-string, r=oli-obk
commit[1] 2022-02-24: Auto merge of #94333 - Dylan-DPC:rollup-7yxtywp, r=Dylan-DPC
commit[2] 2022-02-25: Auto merge of #93368 - eddyb:diagbld-guarantee, r=estebank
commit[3] 2022-02-25: Auto merge of #93878 - Aaron1011:newtype-macro, r=cjgillot
commit[4] 2022-02-25: Auto merge of #94130 - erikdesjardins:partially, r=nikic
commit[5] 2022-02-25: Auto merge of #94350 - matthiaskrgr:rollup-eesfiyr, r=matthiaskrgr
commit[6] 2022-02-25: Auto merge of #93644 - michaelwoerister:simpler-debuginfo-typemap, r=wesleywiser
commit[7] 2022-02-25: Auto merge of #94357 - matthiaskrgr:rollup-xrjaof3, r=matthiaskrgr
commit[8] 2022-02-25: Auto merge of #94279 - tmiasko:write-print, r=Mark-Simulacrum
commit[9] 2022-02-25: Auto merge of #94290 - Mark-Simulacrum:bump-bootstrap, r=pietroalbini
commit[10] 2022-02-25: Auto merge of #94369 - matthiaskrgr:rollup-qtripm2, r=matthiaskrgr
ERROR: no CI builds available between 4b043faba34ccc053a4d0110634c323f6c03765e and d3ad51b48f83329fac0cd8a9f1253f3146613c1c within last 167 days
4b043faba34ccc053a4d0110634c323f6c03765e...d3ad51b48f83329fac0cd8a9f1253f3146613c1c
@Tnze Can you please verify that https://github.com/llvm/llvm-project/commit/e13e808283f7fd9e873ae922dd1ef61aeaa0eb4a fixes this issue?
I'd say we are half way there, on x86_64 Linux:
fn main() {
let s = [[[[Option::<usize>::None; 10]; 10]; 10]; 10];
println!("{:?}", s);
}
Current master: 26.82s With backport: 25.07s
fn main() {
let s = [[[[[Option::<usize>::None; 4]; 9]; 9]; 12]; 5];
println!("{:?}", s);
}
Current master: 3m 17s With backport: 7.57s
I'd say we are half way there, on x86_64 Linux:
fn main() { let s = [[[[Option::<usize>::None; 10]; 10]; 10]; 10]; println!("{:?}", s); }
Current master: 26.82s With backport: 25.07s
fn main() { let s = [[[[[Option::<usize>::None; 4]; 9]; 9]; 12]; 5]; println!("{:?}", s); }
Current master: 3m 17s With backport: 7.57s
I am quite sure that this is working as intended. The first test case splits the allocas into just enough slices that it doesn't overflow the default limit of 1024 but gets quite close to it, so the compile time explosion still occurs. Whereas in the second one, it generates too many so SROA bails early and the issue doesn't occur.
If you try to lower the limit with the --sroa-max-alloca-slices
option to something like 256 or 512, this issue will probably go away.
@nikic Please confirm if I am correct here.
Code
Compile it with
--release
please.Meta
rustc --version --verbose
:Error output
It takes 3 mins to compile.