rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.92k stars 12.68k forks source link

Multidimensional array compilation takes too much time #88580

Open Tnze opened 3 years ago

Tnze commented 3 years ago

Code

Compile it with --release please.

fn main() {
    let s = [[[[Option::<usize>::None; 10]; 10]; 10]; 10];
    println!("{:?}", s);
}

Meta

rustc --version --verbose:

rustc 1.54.0 (a178d0322 2021-07-26)
binary: rustc
commit-hash: a178d0322ce20e33eac124758e837cbd80a6f633
commit-date: 2021-07-26
host: x86_64-apple-darwin
release: 1.54.0
LLVM version: 12.0.1

Error output

It takes 3 mins to compile.

   Compiling untitled v0.1.0 (/Users/Tnze/CLionProjects/untitled)
    Finished release [optimized] target(s) in 3m 15s
Tnze commented 3 years ago

And this takes 12 mins:

fn main() {
    let s = [[[[[Option::<usize>::None; 4]; 9]; 9]; 12]; 5];
    println!("{:?}", s);
}
untitled % cargo build --release
   Compiling untitled v0.1.0 (/Users/Tnze/CLionProjects/untitled)
    Finished release [optimized] target(s) in 12m 20s
hellow554 commented 3 years ago

So I ran a few numbers and it seems that LLVM is the culprit here. I'm doing a benchmark from 1.40.0 to 1.54 and print the time it passes in each stage and so far these are the numbers from 1.40.0:

[...]
time:   0.003; rss:  142MB ->  143MB (   +1MB)  LLVM_module_optimize_module_passes(a.cc5d2364-cgu.1)
time:   0.008; rss:  163MB ->  141MB (  -22MB)  LLVM_module_optimize_module_passes(a.cc5d2364-cgu.4)
time:   0.040; rss:  162MB ->  143MB (  -20MB)  LLVM_module_optimize_module_passes(a.cc5d2364-cgu.2)
time:   0.043; rss:  177MB ->  143MB (  -34MB)  LLVM_module_optimize_module_passes(a.cc5d2364-cgu.3)
time: 287.131; rss:  182MB ->  221MB (  +39MB)  LLVM_module_optimize_module_passes(a.cc5d2364-cgu.5)
time:   0.003; rss:  223MB ->  210MB (  -13MB)  LLVM_lto_optimize(a.cc5d2364-cgu.6)
time:   0.011; rss:  227MB ->  184MB (  -42MB)  LLVM_lto_optimize(a.cc5d2364-cgu.0)
time:   0.015; rss:  223MB ->  187MB (  -36MB)  LLVM_lto_optimize(a.cc5d2364-cgu.1)
time:   0.019; rss:  223MB ->  185MB (  -37MB)  LLVM_lto_optimize(a.cc5d2364-cgu.4)
time:   0.027; rss:  223MB ->  179MB (  -44MB)  LLVM_lto_optimize(a.cc5d2364-cgu.2)
time:   0.030; rss:  225MB ->  182MB (  -43MB)  LLVM_lto_optimize(a.cc5d2364-cgu.3)
time: 174.925; rss:  182MB ->  199MB (  +17MB)  LLVM_lto_optimize(a.cc5d2364-cgu.5)
[...]
471.27user 1.13system 7:52.22elapsed 100%CPU (0avgtext+0avgdata 975232maxresident)k

So 98% of the time is spent in LLVM_module_optimize_module_passes and LLVM_lto_optimize. The numbers only get worse, but I'll post them as soon as it gets ready.

I think I'm doing a earlier run to see if there was a significant bump in the time it takes to compile the program.

Would be nice if somebody can tell if this is already known and a dup of something.

hellow554 commented 3 years ago

@rustbot modify labels: -I-slow I-compiletime A-llvm T-compiler

hellow554 commented 3 years ago
Modified Code ``` fn main() { let s: [[[[[Option; 4]; 9]; 9]; 12]; 5] = [[[[[None; 4]; 9]; 9]; 12]; 5]; println!("{:?}", s); } ```

Somewhere between 1.23.0 and 1.24.0

(RUSTC_BOOTSTRAP is needed for the -Z option to enable)

RUSTC_BOOTSTRAP=1 rustc +1.23.0 -Z time-passes -O a.rs -> 0.3s
``` ===-------------------------------------------------------------------------=== Register Allocation ===-------------------------------------------------------------------------=== Total Execution Time: 0.0001 seconds (0.0001 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.0000 ( 0.0%) 0.0000 ( 49.4%) 0.0000 ( 44.9%) 0.0000 ( 47.0%) Evict 0.0000 ( 0.0%) 0.0000 ( 32.1%) 0.0000 ( 29.2%) 0.0000 ( 28.3%) Local Splitting 0.0000 (100.0%) 0.0000 ( 18.5%) 0.0000 ( 25.8%) 0.0000 ( 24.7%) Seed Live Regs 0.0000 (100.0%) 0.0001 (100.0%) 0.0001 (100.0%) 0.0001 (100.0%) Total ===-------------------------------------------------------------------------=== Instruction Selection and Scheduling ===-------------------------------------------------------------------------=== Total Execution Time: 0.0027 seconds (0.0026 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.0002 ( 41.4%) 0.0006 ( 28.1%) 0.0008 ( 30.2%) 0.0008 ( 29.6%) Instruction Selection 0.0001 ( 19.5%) 0.0004 ( 19.5%) 0.0005 ( 19.5%) 0.0005 ( 20.0%) DAG Combining 1 0.0001 ( 13.4%) 0.0003 ( 13.2%) 0.0004 ( 13.3%) 0.0004 ( 13.4%) Instruction Scheduling 0.0000 ( 10.0%) 0.0002 ( 10.7%) 0.0003 ( 10.6%) 0.0003 ( 10.6%) Instruction Creation 0.0000 ( 4.9%) 0.0002 ( 10.8%) 0.0003 ( 9.9%) 0.0003 ( 10.1%) DAG Combining 2 0.0000 ( 4.4%) 0.0002 ( 7.3%) 0.0002 ( 6.8%) 0.0002 ( 6.9%) DAG Legalization 0.0000 ( 4.1%) 0.0001 ( 4.9%) 0.0001 ( 4.8%) 0.0001 ( 4.4%) Type Legalization 0.0000 ( 0.7%) 0.0001 ( 2.8%) 0.0001 ( 2.5%) 0.0001 ( 2.7%) Vector Legalization 0.0000 ( 0.0%) 0.0000 ( 1.8%) 0.0000 ( 1.5%) 0.0000 ( 1.5%) DAG Combining after legalize types 0.0000 ( 1.7%) 0.0000 ( 0.8%) 0.0000 ( 1.0%) 0.0000 ( 0.9%) Instruction Scheduling Cleanup 0.0004 (100.0%) 0.0023 (100.0%) 0.0027 (100.0%) 0.0026 (100.0%) Total ===-------------------------------------------------------------------------=== DWARF Emission ===-------------------------------------------------------------------------=== Total Execution Time: 0.0001 seconds (0.0001 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.0000 ( 48.1%) 0.0001 ( 71.3%) 0.0001 ( 62.8%) 0.0001 ( 61.2%) DWARF Exception Writer 0.0000 ( 44.4%) 0.0000 ( 28.7%) 0.0001 ( 34.5%) 0.0000 ( 34.5%) Debug Info Emission 0.0000 ( 7.4%) 0.0000 ( 0.0%) 0.0000 ( 2.7%) 0.0000 ( 4.3%) DWARF Debug Writer 0.0001 (100.0%) 0.0001 (100.0%) 0.0001 (100.0%) 0.0001 (100.0%) Total ===-------------------------------------------------------------------------=== ... Pass execution timing report ... ===-------------------------------------------------------------------------=== Total Execution Time: 0.0583 seconds (0.0575 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.0028 ( 7.0%) 0.0009 ( 5.1%) 0.0037 ( 6.4%) 0.0037 ( 6.4%) Function Integration/Inlining 0.0001 ( 0.3%) 0.0034 ( 18.4%) 0.0035 ( 6.0%) 0.0035 ( 6.1%) X86 DAG->DAG Instruction Selection 0.0029 ( 7.2%) 0.0002 ( 1.4%) 0.0031 ( 5.4%) 0.0031 ( 5.5%) Dead Store Elimination 0.0026 ( 6.4%) 0.0004 ( 2.0%) 0.0029 ( 5.0%) 0.0029 ( 5.1%) Induction Variable Simplification 0.0015 ( 3.6%) 0.0004 ( 2.0%) 0.0018 ( 3.1%) 0.0018 ( 3.1%) Combine redundant instructions 0.0014 ( 3.5%) 0.0001 ( 0.4%) 0.0015 ( 2.5%) 0.0015 ( 2.6%) Machine Natural Loop Construction 0.0011 ( 2.8%) 0.0003 ( 1.9%) 0.0015 ( 2.5%) 0.0015 ( 2.6%) Combine redundant instructions 0.0010 ( 2.5%) 0.0004 ( 1.9%) 0.0014 ( 2.3%) 0.0014 ( 2.3%) SROA 0.0010 ( 2.4%) 0.0003 ( 1.6%) 0.0012 ( 2.1%) 0.0012 ( 2.2%) Combine redundant instructions 0.0008 ( 2.1%) 0.0003 ( 1.5%) 0.0011 ( 1.9%) 0.0011 ( 1.9%) X86 DAG->DAG Instruction Selection 0.0008 ( 2.0%) 0.0003 ( 1.4%) 0.0011 ( 1.8%) 0.0011 ( 1.9%) Combine redundant instructions 0.0008 ( 2.0%) 0.0002 ( 1.3%) 0.0010 ( 1.8%) 0.0010 ( 1.8%) Module Verifier 0.0007 ( 1.8%) 0.0003 ( 1.6%) 0.0010 ( 1.7%) 0.0010 ( 1.7%) SROA 0.0008 ( 1.9%) 0.0002 ( 1.3%) 0.0010 ( 1.7%) 0.0010 ( 1.7%) Deduce function attributes 0.0007 ( 1.7%) 0.0003 ( 1.4%) 0.0009 ( 1.6%) 0.0009 ( 1.6%) Combine redundant instructions 0.0007 ( 1.7%) 0.0002 ( 1.2%) 0.0009 ( 1.6%) 0.0009 ( 1.5%) Global Value Numbering 0.0007 ( 1.7%) 0.0002 ( 1.1%) 0.0009 ( 1.5%) 0.0009 ( 1.5%) Assumption Cache Tracker 0.0006 ( 1.5%) 0.0002 ( 1.1%) 0.0008 ( 1.4%) 0.0008 ( 1.4%) Early CSE 0.0007 ( 1.7%) 0.0001 ( 0.6%) 0.0008 ( 1.4%) 0.0008 ( 1.4%) Unroll loops 0.0005 ( 1.3%) 0.0002 ( 0.9%) 0.0007 ( 1.2%) 0.0007 ( 1.2%) Early CSE 0.0005 ( 1.3%) 0.0002 ( 0.9%) 0.0007 ( 1.2%) 0.0007 ( 1.2%) Global Value Numbering 0.0000 ( 0.1%) 0.0006 ( 3.2%) 0.0006 ( 1.1%) 0.0006 ( 1.1%) Machine Instruction Scheduler 0.0004 ( 0.9%) 0.0001 ( 0.6%) 0.0005 ( 0.8%) 0.0005 ( 0.8%) Value Propagation 0.0000 ( 0.1%) 0.0004 ( 2.3%) 0.0004 ( 0.8%) 0.0004 ( 0.8%) X86 Assembly Printer 0.0003 ( 0.8%) 0.0001 ( 0.6%) 0.0004 ( 0.7%) 0.0004 ( 0.7%) Remove unused exception handling info 0.0003 ( 0.8%) 0.0001 ( 0.5%) 0.0004 ( 0.7%) 0.0004 ( 0.7%) Expand Atomic instructions 0.0003 ( 0.7%) 0.0001 ( 0.6%) 0.0004 ( 0.6%) 0.0004 ( 0.7%) Create Garbage Collector Module Metadata 0.0003 ( 0.7%) 0.0001 ( 0.5%) 0.0004 ( 0.6%) 0.0004 ( 0.6%) Value Propagation 0.0003 ( 0.8%) 0.0001 ( 0.4%) 0.0004 ( 0.6%) 0.0004 ( 0.6%) Loop Invariant Code Motion 0.0000 ( 0.0%) 0.0004 ( 2.0%) 0.0004 ( 0.6%) 0.0004 ( 0.6%) Live Variable Analysis 0.0003 ( 0.6%) 0.0001 ( 0.5%) 0.0003 ( 0.6%) 0.0004 ( 0.6%) Aggressive Dead Code Elimination 0.0000 ( 0.1%) 0.0003 ( 1.8%) 0.0004 ( 0.6%) 0.0004 ( 0.6%) Greedy Register Allocator 0.0000 ( 0.1%) 0.0003 ( 1.8%) 0.0004 ( 0.6%) 0.0004 ( 0.6%) Block Frequency Analysis 0.0003 ( 0.6%) 0.0001 ( 0.5%) 0.0004 ( 0.6%) 0.0004 ( 0.6%) Simplify the CFG 0.0003 ( 0.7%) 0.0001 ( 0.4%) 0.0003 ( 0.6%) 0.0003 ( 0.6%) Jump Threading 0.0003 ( 0.6%) 0.0001 ( 0.4%) 0.0003 ( 0.6%) 0.0003 ( 0.6%) Tail Call Elimination 0.0002 ( 0.6%) 0.0001 ( 0.4%) 0.0003 ( 0.5%) 0.0003 ( 0.5%) Simplify the CFG 0.0002 ( 0.6%) 0.0001 ( 0.3%) 0.0003 ( 0.5%) 0.0003 ( 0.5%) Exception handling preparation 0.0002 ( 0.6%) 0.0001 ( 0.4%) 0.0003 ( 0.5%) 0.0003 ( 0.5%) Machine Instruction Scheduler 0.0002 ( 0.5%) 0.0001 ( 0.4%) 0.0003 ( 0.5%) 0.0003 ( 0.5%) Sparse Conditional Constant Propagation 0.0002 ( 0.6%) 0.0001 ( 0.4%) 0.0003 ( 0.5%) 0.0003 ( 0.5%) Simplify the CFG 0.0003 ( 0.7%) 0.0000 ( 0.0%) 0.0003 ( 0.5%) 0.0003 ( 0.5%) Profile summary info 0.0002 ( 0.5%) 0.0001 ( 0.3%) 0.0003 ( 0.5%) 0.0003 ( 0.5%) Interprocedural Sparse Conditional Constant Propagation 0.0002 ( 0.5%) 0.0001 ( 0.5%) 0.0003 ( 0.5%) 0.0003 ( 0.5%) Target Pass Configuration 0.0002 ( 0.5%) 0.0001 ( 0.4%) 0.0003 ( 0.5%) 0.0003 ( 0.5%) Jump Threading 0.0001 ( 0.3%) 0.0001 ( 0.7%) 0.0003 ( 0.4%) 0.0003 ( 0.5%) Dominator Tree Construction 0.0002 ( 0.4%) 0.0001 ( 0.6%) 0.0003 ( 0.5%) 0.0003 ( 0.5%) Scalar Evolution Analysis 0.0002 ( 0.5%) 0.0001 ( 0.4%) 0.0003 ( 0.4%) 0.0003 ( 0.4%) Scalar Evolution Analysis 0.0002 ( 0.4%) 0.0001 ( 0.3%) 0.0002 ( 0.4%) 0.0003 ( 0.4%) Scalar Evolution Analysis 0.0002 ( 0.5%) 0.0001 ( 0.4%) 0.0003 ( 0.4%) 0.0003 ( 0.4%) Simplify the CFG 0.0002 ( 0.5%) 0.0000 ( 0.2%) 0.0002 ( 0.4%) 0.0002 ( 0.4%) Loop Invariant Code Motion 0.0002 ( 0.4%) 0.0001 ( 0.3%) 0.0002 ( 0.4%) 0.0002 ( 0.4%) Bit-Tracking Dead Code Elimination 0.0002 ( 0.6%) 0.0000 ( 0.0%) 0.0002 ( 0.4%) 0.0002 ( 0.4%) Combine redundant instructions 0.0002 ( 0.5%) 0.0000 ( 0.2%) 0.0002 ( 0.4%) 0.0002 ( 0.4%) Rotate Loops 0.0002 ( 0.5%) 0.0001 ( 0.3%) 0.0002 ( 0.4%) 0.0002 ( 0.4%) Dead Argument Elimination 0.0001 ( 0.4%) 0.0000 ( 0.2%) 0.0002 ( 0.3%) 0.0002 ( 0.4%) Safe Stack instrumentation pass 0.0002 ( 0.4%) 0.0001 ( 0.3%) 0.0002 ( 0.4%) 0.0002 ( 0.4%) Natural Loop Information 0.0002 ( 0.4%) 0.0001 ( 0.3%) 0.0002 ( 0.4%) 0.0002 ( 0.4%) X86 Assembly Printer 0.0001 ( 0.3%) 0.0001 ( 0.4%) 0.0002 ( 0.3%) 0.0002 ( 0.4%) Natural Loop Information 0.0001 ( 0.3%) 0.0001 ( 0.3%) 0.0002 ( 0.3%) 0.0002 ( 0.4%) Post-Dominator Tree Construction 0.0002 ( 0.5%) 0.0000 ( 0.0%) 0.0002 ( 0.4%) 0.0002 ( 0.4%) Combine redundant instructions 0.0001 ( 0.4%) 0.0000 ( 0.3%) 0.0002 ( 0.3%) 0.0002 ( 0.4%) Dominator Tree Construction 0.0001 ( 0.2%) 0.0001 ( 0.6%) 0.0002 ( 0.3%) 0.0002 ( 0.3%) Control Flow Optimizer 0.0001 ( 0.4%) 0.0000 ( 0.2%) 0.0002 ( 0.3%) 0.0002 ( 0.3%) Function Alias Analysis Results 0.0009 ( 2.3%) 0.0003 ( 1.6%) 0.0012 ( 2.1%) 0.0002 ( 0.3%) Scalar Evolution Analysis 0.0001 ( 0.3%) 0.0000 ( 0.3%) 0.0002 ( 0.3%) 0.0002 ( 0.3%) Function Alias Analysis Results 0.0000 ( 0.0%) 0.0002 ( 1.0%) 0.0002 ( 0.3%) 0.0002 ( 0.3%) Prologue/Epilogue Insertion & Frame Finalization 0.0001 ( 0.4%) 0.0000 ( 0.2%) 0.0002 ( 0.3%) 0.0002 ( 0.3%) Canonicalize natural loops 0.0002 ( 0.5%) 0.0000 ( 0.0%) 0.0002 ( 0.3%) 0.0002 ( 0.3%) Combine redundant instructions 0.0002 ( 0.4%) 0.0000 ( 0.2%) 0.0002 ( 0.3%) 0.0002 ( 0.3%) Dominator Tree Construction 0.0001 ( 0.4%) 0.0000 ( 0.2%) 0.0002 ( 0.3%) 0.0002 ( 0.3%) Dominator Tree Construction 0.0001 ( 0.3%) 0.0000 ( 0.3%) 0.0002 ( 0.3%) 0.0002 ( 0.3%) Dominator Tree Construction 0.0001 ( 0.3%) 0.0000 ( 0.2%) 0.0002 ( 0.3%) 0.0002 ( 0.3%) Module Verifier 0.0001 ( 0.3%) 0.0001 ( 0.3%) 0.0002 ( 0.3%) 0.0002 ( 0.3%) Canonicalize natural loops 0.0001 ( 0.3%) 0.0000 ( 0.2%) 0.0002 ( 0.3%) 0.0002 ( 0.3%) Function Alias Analysis Results 0.0001 ( 0.3%) 0.0000 ( 0.2%) 0.0002 ( 0.3%) 0.0002 ( 0.3%) Function Alias Analysis Results 0.0001 ( 0.3%) 0.0000 ( 0.3%) 0.0002 ( 0.3%) 0.0002 ( 0.3%) Dominator Tree Construction 0.0001 ( 0.3%) 0.0000 ( 0.2%) 0.0002 ( 0.3%) 0.0002 ( 0.3%) Module Verifier 0.0001 ( 0.3%) 0.0000 ( 0.2%) 0.0002 ( 0.3%) 0.0002 ( 0.3%) Canonicalize natural loops 0.0001 ( 0.3%) 0.0000 ( 0.2%) 0.0002 ( 0.3%) 0.0002 ( 0.3%) Function Alias Analysis Results 0.0001 ( 0.3%) 0.0000 ( 0.3%) 0.0002 ( 0.3%) 0.0002 ( 0.3%) Dominator Tree Construction 0.0001 ( 0.3%) 0.0000 ( 0.2%) 0.0002 ( 0.3%) 0.0002 ( 0.3%) Dominator Tree Construction 0.0001 ( 0.2%) 0.0001 ( 0.5%) 0.0002 ( 0.3%) 0.0002 ( 0.3%) Demanded bits analysis 0.0000 ( 0.0%) 0.0002 ( 0.8%) 0.0002 ( 0.3%) 0.0002 ( 0.3%) Machine Common Subexpression Elimination 0.0001 ( 0.3%) 0.0000 ( 0.2%) 0.0002 ( 0.3%) 0.0002 ( 0.3%) Function Alias Analysis Results 0.0001 ( 0.3%) 0.0000 ( 0.3%) 0.0002 ( 0.3%) 0.0002 ( 0.3%) Dominator Tree Construction 0.0001 ( 0.4%) 0.0000 ( 0.0%) 0.0001 ( 0.2%) 0.0001 ( 0.3%) Demanded bits analysis 0.0001 ( 0.3%) 0.0000 ( 0.2%) 0.0002 ( 0.3%) 0.0001 ( 0.3%) Function Alias Analysis Results 0.0001 ( 0.3%) 0.0000 ( 0.2%) 0.0001 ( 0.3%) 0.0001 ( 0.3%) Function Alias Analysis Results 0.0001 ( 0.3%) 0.0000 ( 0.2%) 0.0001 ( 0.3%) 0.0001 ( 0.3%) Function Alias Analysis Results 0.0001 ( 0.3%) 0.0000 ( 0.2%) 0.0001 ( 0.2%) 0.0001 ( 0.2%) Lazy Value Information Analysis 0.0001 ( 0.3%) 0.0000 ( 0.0%) 0.0001 ( 0.2%) 0.0001 ( 0.2%) PGOIndirectCallPromotion 0.0001 ( 0.2%) 0.0000 ( 0.2%) 0.0001 ( 0.2%) 0.0001 ( 0.2%) Memory Dependence Analysis 0.0001 ( 0.2%) 0.0000 ( 0.1%) 0.0001 ( 0.2%) 0.0001 ( 0.2%) Dominator Tree Construction 0.0000 ( 0.0%) 0.0001 ( 0.6%) 0.0001 ( 0.2%) 0.0001 ( 0.2%) Machine Copy Propagation Pass 0.0001 ( 0.2%) 0.0000 ( 0.2%) 0.0001 ( 0.2%) 0.0001 ( 0.2%) Conditionally eliminate dead library calls 0.0001 ( 0.2%) 0.0000 ( 0.1%) 0.0001 ( 0.2%) 0.0001 ( 0.2%) Loop-Closed SSA Form Pass 0.0001 ( 0.2%) 0.0000 ( 0.2%) 0.0001 ( 0.2%) 0.0001 ( 0.2%) Lazy Value Information Analysis 0.0000 ( 0.0%) 0.0001 ( 0.6%) 0.0001 ( 0.2%) 0.0001 ( 0.2%) Two-Address instruction pass 0.0001 ( 0.2%) 0.0000 ( 0.2%) 0.0001 ( 0.2%) 0.0001 ( 0.2%) Basic Alias Analysis (stateless AA impl) 0.0001 ( 0.2%) 0.0000 ( 0.2%) 0.0001 ( 0.2%) 0.0001 ( 0.2%) Remove unreachable blocks from the CFG 0.0001 ( 0.2%) 0.0000 ( 0.1%) 0.0001 ( 0.2%) 0.0001 ( 0.2%) Memory Dependence Analysis 0.0001 ( 0.2%) 0.0000 ( 0.1%) 0.0001 ( 0.2%) 0.0001 ( 0.2%) Loop-Closed SSA Form Pass 0.0001 ( 0.2%) 0.0000 ( 0.1%) 0.0001 ( 0.2%) 0.0001 ( 0.2%) Live Variable Analysis 0.0001 ( 0.2%) 0.0000 ( 0.1%) 0.0001 ( 0.2%) 0.0001 ( 0.2%) Unswitch loops 0.0001 ( 0.2%) 0.0000 ( 0.1%) 0.0001 ( 0.2%) 0.0001 ( 0.2%) Basic Alias Analysis (stateless AA impl) 0.0001 ( 0.2%) 0.0000 ( 0.1%) 0.0001 ( 0.2%) 0.0001 ( 0.2%) MergedLoadStoreMotion 0.0000 ( 0.0%) 0.0001 ( 0.6%) 0.0001 ( 0.2%) 0.0001 ( 0.2%) Merge disjoint stack slots 0.0001 ( 0.2%) 0.0000 ( 0.2%) 0.0001 ( 0.2%) 0.0001 ( 0.2%) Loop-Closed SSA Form Pass 0.0001 ( 0.2%) 0.0000 ( 0.1%) 0.0001 ( 0.2%) 0.0001 ( 0.2%) Basic Alias Analysis (stateless AA impl) 0.0000 ( 0.0%) 0.0001 ( 0.5%) 0.0001 ( 0.2%) 0.0001 ( 0.2%) Live Interval Analysis 0.0001 ( 0.2%) 0.0000 ( 0.2%) 0.0001 ( 0.2%) 0.0001 ( 0.2%) Insert XRay ops 0.0001 ( 0.2%) 0.0000 ( 0.2%) 0.0001 ( 0.2%) 0.0001 ( 0.2%) Memory Dependence Analysis 0.0001 ( 0.2%) 0.0000 ( 0.2%) 0.0001 ( 0.2%) 0.0001 ( 0.2%) Basic Alias Analysis (stateless AA impl) 0.0001 ( 0.2%) 0.0000 ( 0.1%) 0.0001 ( 0.2%) 0.0001 ( 0.2%) Basic Alias Analysis (stateless AA impl) 0.0001 ( 0.2%) 0.0000 ( 0.1%) 0.0001 ( 0.2%) 0.0001 ( 0.2%) Basic Alias Analysis (stateless AA impl) 0.0001 ( 0.2%) 0.0000 ( 0.2%) 0.0001 ( 0.2%) 0.0001 ( 0.2%) Basic Alias Analysis (stateless AA impl) 0.0001 ( 0.2%) 0.0000 ( 0.1%) 0.0001 ( 0.2%) 0.0001 ( 0.2%) Greedy Register Allocator 0.0001 ( 0.1%) 0.0000 ( 0.1%) 0.0001 ( 0.1%) 0.0001 ( 0.2%) Lazy Branch Probability Analysis 0.0001 ( 0.2%) 0.0000 ( 0.0%) 0.0001 ( 0.2%) 0.0001 ( 0.2%) Globals Alias Analysis 0.0001 ( 0.1%) 0.0000 ( 0.2%) 0.0001 ( 0.1%) 0.0001 ( 0.2%) Lazy Branch Probability Analysis 0.0001 ( 0.2%) 0.0000 ( 0.1%) 0.0001 ( 0.1%) 0.0001 ( 0.2%) Basic Alias Analysis (stateless AA impl) 0.0001 ( 0.2%) 0.0000 ( 0.1%) 0.0001 ( 0.2%) 0.0001 ( 0.2%) Insert stack protectors 0.0001 ( 0.2%) 0.0000 ( 0.1%) 0.0001 ( 0.1%) 0.0001 ( 0.1%) Lower 'expect' Intrinsics 0.0001 ( 0.1%) 0.0000 ( 0.1%) 0.0001 ( 0.1%) 0.0001 ( 0.1%) Memory Dependence Analysis 0.0001 ( 0.1%) 0.0000 ( 0.1%) 0.0001 ( 0.1%) 0.0001 ( 0.1%) Optimization Remark Emitter 0.0001 ( 0.2%) 0.0000 ( 0.1%) 0.0001 ( 0.1%) 0.0001 ( 0.1%) Optimization Remark Emitter 0.0001 ( 0.2%) 0.0000 ( 0.1%) 0.0001 ( 0.1%) 0.0001 ( 0.1%) Recognize loop idioms 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0001 ( 0.1%) 0.0001 ( 0.1%) LCSSA Verifier 0.0000 ( 0.0%) 0.0001 ( 0.4%) 0.0001 ( 0.1%) 0.0001 ( 0.1%) Peephole Optimizations 0.0001 ( 0.1%) 0.0000 ( 0.1%) 0.0001 ( 0.1%) 0.0001 ( 0.1%) LCSSA Verifier 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0001 ( 0.1%) 0.0001 ( 0.1%) LCSSA Verifier 0.0000 ( 0.1%) 0.0000 ( 0.2%) 0.0001 ( 0.1%) 0.0001 ( 0.1%) Machine Block Frequency Analysis 0.0001 ( 0.2%) 0.0000 ( 0.0%) 0.0001 ( 0.1%) 0.0001 ( 0.1%) Remove redundant instructions 0.0000 ( 0.0%) 0.0001 ( 0.3%) 0.0001 ( 0.1%) 0.0001 ( 0.1%) Remove dead machine instructions 0.0000 ( 0.0%) 0.0001 ( 0.3%) 0.0001 ( 0.1%) 0.0001 ( 0.1%) Machine Natural Loop Construction 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0001 ( 0.1%) 0.0001 ( 0.1%) Live Interval Analysis 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0001 ( 0.1%) 0.0001 ( 0.1%) Loop Vectorization 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0001 ( 0.1%) 0.0001 ( 0.1%) Natural Loop Information 0.0000 ( 0.1%) 0.0000 ( 0.2%) 0.0001 ( 0.1%) 0.0001 ( 0.1%) Contiguously Lay Out Funclets 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0001 ( 0.1%) 0.0001 ( 0.1%) Natural Loop Information 0.0000 ( 0.0%) 0.0000 ( 0.3%) 0.0001 ( 0.1%) 0.0001 ( 0.1%) Virtual Register Rewriter 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0001 ( 0.1%) 0.0001 ( 0.1%) Scalar Evolution Analysis 0.0000 ( 0.0%) 0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) Machine Block Frequency Analysis 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) Virtual Register Rewriter 0.0000 ( 0.0%) 0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) Basic Alias Analysis (stateless AA impl) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) CallGraph Construction 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) Dominator Tree Construction 0.0000 ( 0.0%) 0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) X86 LEA Optimize 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) Branch Probability Analysis 0.0000 ( 0.0%) 0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) Machine Block Frequency Analysis 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) Natural Loop Information 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) Scalar Evolution Analysis 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) Global Variable Optimizer 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) Scalar Evolution Analysis 0.0000 ( 0.0%) 0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) Remove dead machine instructions 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) Branch Probability Analysis 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) Optimize machine instruction PHIs 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) Delete dead loops 0.0000 ( 0.0%) 0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) Machine Block Frequency Analysis 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) Implement the 'patchable-function' attribute 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) Block Frequency Analysis 0.0000 ( 0.0%) 0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) Slot index numbering 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) Dominator Tree Construction 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) Prologue/Epilogue Insertion & Frame Finalization 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) Dominator Tree Construction 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) MachineDominator Tree Construction 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) Dead Global Elimination 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) Simplify the CFG 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) Post-RA pseudo instruction expansion pass 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) A No-Op Barrier Pass 0.0000 ( 0.0%) 0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) Slot index numbering 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) MachineDominator Tree Construction 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) Canonicalize natural loops 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) Constant Hoisting 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) Remove dead machine instructions 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Function Alias Analysis Results 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Rewrite Symbols 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0001 ( 0.1%) 0.0000 ( 0.0%) Canonicalize natural loops 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) MachineDominator Tree Construction 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Block Frequency Analysis 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) X86 Atom pad short functions 0.0000 ( 0.0%) 0.0000 ( 0.2%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine InstCombiner 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) MachineDominator Tree Construction 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Constant Hoisting 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Two-Address instruction pass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) CodeGen Prepare 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Scalar Evolution Analysis 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Peephole Optimizations 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) Machine Loop Invariant Code Motion 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Partially inline calls to library functions 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Execution dependency fix 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Scalar Evolution Analysis 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) Function Alias Analysis Results 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Remove unreachable machine basic blocks 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Live Register Matrix 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) X86 Optimize Call Frame 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Block Frequency Analysis 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Scalar Evolution Analysis 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Shadow Stack GC Lowering 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Function Alias Analysis Results 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) MachinePostDominator Tree Construction 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) MachineDominator Tree Construction 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Eliminate PHI nodes for register allocation 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Simple Register Coalescing 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Scalar Evolution Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Block Frequency Analysis 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Float to int 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) X86 pseudo instruction expansion pass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Function Alias Analysis Results 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Branch Probability Basic Block Placement 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Function Alias Analysis Results 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Unnamed pass: implement Pass::getPassName() 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Loop Invariant Code Motion 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Exception handling preparation 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine code sinking 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Function Alias Analysis Results 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Expand Atomic instructions 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) MachineDominator Tree Construction 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Debug Variable Analysis 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Transform Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Branch Probability Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) X86 pseudo instruction expansion pass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Block Frequency Analysis 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) MachinePostDominator Tree Construction 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Dominator Tree Construction 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Local Dynamic TLS Access Clean-up 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Loop Invariant Code Motion 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Spill Code Placement Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) CallGraph Construction 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Virtual Register Map 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Global Variable Optimizer 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) X86 Byte/Word Instruction Fixup 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Block Frequency Analysis 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Tail Duplication 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Live Register Matrix 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Remove unreachable machine basic blocks 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Dead Global Elimination 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) X86 LEA Fixup 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Dominator Tree Construction 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Common Subexpression Elimination 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Interleaved Access Pass 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Transform Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Dominator Tree Construction 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Natural Loop Construction 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Canonicalize natural loops 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Branch Probability Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine InstCombiner 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Canonicalize natural loops 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Canonicalize natural loops 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Canonicalize natural loops 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Basic Alias Analysis (stateless AA impl) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Natural Loop Construction 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Tail Duplication 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Canonicalize natural loops 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) MachinePostDominator Tree Construction 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) CallGraph Construction 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) X86 Fixup SetCC 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Basic Alias Analysis (stateless AA impl) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Early If-Conversion 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Basic Alias Analysis (stateless AA impl) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Loop Invariant Code Motion 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Bundle Machine CFG Edges 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Loop Distribution 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Basic Alias Analysis (stateless AA impl) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Live Stack Slot Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Loop Load Elimination 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Live DEBUG_VALUE analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) MachinePostDominator Tree Construction 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Spill Code Placement Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Remove dead machine instructions 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Loop-Closed SSA Form Pass 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) X86 FP Stackifier 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Loop-Closed SSA Form Pass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Canonicalize natural loops 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Loop-Closed SSA Form Pass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Trace Metrics 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Loop Access Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Slot index numbering 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Interleaved Access Pass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Loop-Closed SSA Form Pass 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Analyze Machine Code For Garbage Collection 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Optimization Remark Emitter 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Slot index numbering 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Eliminate PHI nodes for register allocation 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Lazy Branch Probability Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Dominator Tree Construction 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Post RA top-down list latency scheduler 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Copy Propagation Pass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Rename Disconnected Subregister Components 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Natural Loop Construction 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Loop-Closed SSA Form Pass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) MachineDominator Tree Construction 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Basic Alias Analysis (stateless AA impl) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Loop Access Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) LCSSA Verifier 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) LCSSA Verifier 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Local Stack Slot Allocation 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) LCSSA Verifier 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) X86 WinAlloca Expander 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) X86 LEA Optimize 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Dominator Tree Construction 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Lazy Branch Probability Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Partially inline calls to library functions 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Compressing EVEX instrs to VEX encoding when possible 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Alignment from assumptions 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Natural Loop Construction 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Debug Variable Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Expand ISel Pseudo-instructions 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Lower Garbage Collection Instructions 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Optimization Remark Emitter 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Dominator Tree Construction 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Dominator Tree Construction 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) LCSSA Verifier 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) X86 vzeroupper inserter 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine code sinking 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Detect Dead Lanes 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) X86 Optimize Call Frame 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Inserts calls to mcount-like functions 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Virtual Register Map 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Shrink Wrapping analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Globals Alias Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) LCSSA Verifier 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) X86 vzeroupper inserter 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) X86 Fixup SetCC 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Live Stack Slot Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Tail Duplication 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Stack Slot Coloring 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Merge disjoint stack slots 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Bundle Machine CFG Edges 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Tail Duplication 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Lower Garbage Collection Instructions 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Bundle Machine CFG Edges 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Rename Disconnected Subregister Components 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Process Implicit Definitions 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) X86 FP Stackifier 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Branch Probability Basic Block Placement 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Post-RA pseudo instruction expansion pass 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Analyze Machine Code For Garbage Collection 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Early If-Conversion 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) StackMap Liveness Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) X86 WinAlloca Expander 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Inserts calls to mcount-like functions 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Live DEBUG_VALUE analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) X86 Atom pad short functions 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) X86 LEA Fixup 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Compressing EVEX instrs to VEX encoding when possible 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Local Stack Slot Allocation 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Detect Dead Lanes 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) X86 PIC Global Base Reg Initialization 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Deduce function attributes in RPO 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Infer set function attributes 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Pre-ISel Intrinsic Lowering 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Merge Duplicate Global Constants 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Pre-ISel Intrinsic Lowering 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Assumption Cache Tracker 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Strip Unused Function Prototypes 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Eliminate Available Externally Globals 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Pre-ISel Intrinsic Lowering 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Rewrite Symbols 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Force set function attributes 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Scoped NoAlias Alias Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Pass Configuration 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Scoped NoAlias Alias Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Module Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Assumption Cache Tracker 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Scoped NoAlias Alias Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Profile summary info 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Transform Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Type-Based Alias Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Branch Probability Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Create Garbage Collector Module Metadata 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Module Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Module Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Branch Probability Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Type-Based Alias Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Type-Based Alias Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Branch Probability Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Create Garbage Collector Module Metadata 0.0400 (100.0%) 0.0184 (100.0%) 0.0583 (100.0%) 0.0575 (100.0%) Total ```
RUSTC_BOOTSTRAP=1 rustc +1.24.0 -Z time-passes -O a.rs -> 233s
``` time: 0.000; rss: 48MB parsing time: 0.000; rss: 53MB recursion limit time: 0.000; rss: 53MB crate injection time: 0.000; rss: 53MB plugin loading time: 0.000; rss: 53MB plugin registration time: 0.012; rss: 70MB expansion time: 0.000; rss: 70MB maybe building test harness time: 0.000; rss: 70MB maybe creating a macro crate time: 0.000; rss: 70MB creating allocators time: 0.000; rss: 70MB checking for inline asm in case the target doesn't support it time: 0.000; rss: 74MB AST validation time: 0.002; rss: 78MB name resolution time: 0.000; rss: 78MB complete gated feature checking time: 0.000; rss: 78MB lowering ast -> hir time: 0.000; rss: 78MB early lint checks time: 0.000; rss: 78MB indexing hir time: 0.000; rss: 78MB attribute checking time: 0.000; rss: 78MB load query result cache time: 0.000; rss: 78MB looking for entry point time: 0.000; rss: 78MB looking for plugin registrar time: 0.000; rss: 78MB loop checking time: 0.000; rss: 78MB static item recursion checking time: 0.000; rss: 78MB stability checking time: 0.000; rss: 82MB type collecting time: 0.000; rss: 82MB outlives testing time: 0.000; rss: 82MB impl wf inference time: 0.000; rss: 82MB coherence checking time: 0.000; rss: 82MB variance testing time: 0.000; rss: 82MB wf checking time: 0.000; rss: 82MB item-types checking time: 0.013; rss: 105MB item-bodies checking time: 0.000; rss: 105MB const checking time: 0.000; rss: 105MB privacy checking time: 0.000; rss: 105MB intrinsic checking time: 0.000; rss: 105MB match checking time: 0.000; rss: 105MB liveness checking time: 0.001; rss: 105MB borrow checking time: 0.000; rss: 105MB MIR borrow checking time: 0.000; rss: 105MB MIR effect checking time: 0.000; rss: 105MB death checking time: 0.000; rss: 105MB unused lib feature checking time: 0.000; rss: 105MB lint checking time: 0.000; rss: 105MB resolving dependency formats time: 0.000; rss: 105MB write metadata time: 0.007; rss: 105MB translation item collection time: 0.000; rss: 105MB codegen unit partitioning time: 0.001; rss: 124MB write allocator module time: 0.002; rss: 135MB llvm function passes [a3] time: 0.000; rss: 141MB llvm function passes [a1] time: 0.002; rss: 143MB llvm module passes [a1] time: 0.013; rss: 145MB translate to LLVM IR time: 0.000; rss: 143MB assert dep graph time: 0.000; rss: 143MB serialize dep graph time: 0.001; rss: 143MB llvm function passes [a2] time: 0.067; rss: 143MB translation time: 0.004; rss: 143MB llvm function passes [a0] time: 0.001; rss: 148MB llvm function passes [a4] time: 0.001; rss: 147MB llvm module passes [a2] time: 0.000; rss: 145MB llvm function passes [a5] time: 0.001; rss: 145MB llvm module passes [a5] time: 0.014; rss: 145MB llvm module passes [a3] time: 0.009; rss: 147MB llvm module passes [a4] time: 0.012; rss: 145MB llvm module passes [a0] time: 0.001; rss: 145MB LTO passes time: 0.001; rss: 145MB LTO passes time: 0.001; rss: 145MB codegen passes [a5-317d481089b8c8fe83113de504472633.rs] time: 0.002; rss: 145MB LTO passes time: 0.002; rss: 145MB codegen passes [a2-317d481089b8c8fe83113de504472633.rs] time: 0.002; rss: 148MB codegen passes [a1-317d481089b8c8fe83113de504472633.rs] time: 0.007; rss: 148MB LTO passes time: 0.011; rss: 148MB LTO passes time: 0.005; rss: 148MB codegen passes [a0-317d481089b8c8fe83113de504472633.rs] time: 0.006; rss: 150MB codegen passes [a3-317d481089b8c8fe83113de504472633.rs] time: 240.300; rss: 190MB LTO passes time: 6.298; rss: 186MB codegen passes [a4-317d481089b8c8fe83113de504472633.rs] time: 246.644; rss: 160MB LLVM passes time: 0.000; rss: 156MB serialize work products time: 0.246; rss: 156MB running linker time: 0.246; rss: 156MB linking ```

Funny enough, they both run on the same LLVM Version:

marcel@ /t/tmp.ParxvQDiIQ> rustc +1.23.0  -vV
rustc 1.23.0 (766bd11c8 2018-01-01)
binary: rustc
commit-hash: 766bd11c8a3c019ca53febdcd77b2215379dd67d
commit-date: 2018-01-01
host: x86_64-unknown-linux-gnu
release: 1.23.0
LLVM version: 4.0

marcel@ /t/tmp.ParxvQDiIQ> rustc +1.24.0  -vV
rustc 1.24.0 (4d90ac38c 2018-02-12)
binary: rustc
commit-hash: 4d90ac38c0b61bb69470b61ea2cccea0df48d9e5
commit-date: 2018-02-12
host: x86_64-unknown-linux-gnu
release: 1.24.0
LLVM version: 4.0
nikic commented 3 years ago

After full unrolling, we basically first initialize the innermost array, then copy that to the next level, then copy that to the next level. Now SROA comes along, and breaks up the memcpys into individual stores, which allows them to be forwarded. We're now left with 20000 or so stores to initialize the array. Then InstCombine comes along and tries to drop the alloca by visiting all its uses. Probably many times if some other fold happens in between.

Tnze commented 3 years ago

I'm new to rust. I don't understand what's InstCombine or SROA.

Is there anything I can do? Whether this is a bug?

If it is, could we do better or is it a optimization algorithm limitation and cannot be easily fixed?

hellow554 commented 3 years ago

I can suggest you a workaround:

fn main() {
    const N: Option<usize>: None;
    let s = [[[[[N; 4]; 9]; 9]; 12]; 5];
    println!("{:?}", s);
}

this compiles in a few milliseconds.

nico-abram commented 2 years ago

I'm new to rust. I don't understand what's InstCombine or SROA.

Is there anything I can do? Whether this is a bug?

If it is, could we do better or is it a optimization algorithm limitation and cannot be easily fixed?

Whether this is a bug?

I think this is pretty clearly a bug somewhere, and unwanted behaviour. If it was not considered as such, someone would probably close this issue/ticket.

If it is, could we do better or is it a optimization algorithm limitation and cannot be easily fixed?

As hellow554 mentioned above, the time is being spent in LLVM, which is C++ code and not directly part of this repository. The rust compiler typically uses LLVM as the backend to generate and optimize it's output (Usually machine code). I would imagine fixing it is probably a matter of changing the heuristics or parameters/configuration LLVM is invoked with, but it is probably something to be careful with so as to not make existing code slower.

I think what was used to generate the table with the time spent in each place was the unstable -Ztime-passes flag. I don't know if it has any documentation, but you can read a bit about it in this blog post https://blog.mozilla.org/nnethercote/2016/10/14/how-to-speed-up-the-rust-compiler/

Then nikic gave a bit of insight into where LLVM is spending that time.

I'm new to rust. I don't understand what's InstCombine or SROA.

InstCombine is one of LLVM's optimization passes. If you want, you can read a bit about it here https://llvm.org/docs/Passes.html#instcombine-combine-redundant-instructions

SROA stands for "Scalar Replacement Of Aggregates" and is a common optimization, and also an LLVM optimization pass. It is described here https://llvm.org/docs/Passes.html#sroa-scalar-replacement-of-aggregates

crlf0710 commented 2 years ago

I feel this specific code snippet can/should be integrated with inline_const feature after it's stablized, keeping major work on the rustc side instead of on llvm side.

cc #76001

Tnze commented 1 year ago

I can suggest you a workaround:

fn main() {
    const N: Option<usize>: None;
    let s = [[[[[N; 4]; 9]; 9]; 12]; 5];
    println!("{:?}", s);
}

this compiles in a few milliseconds.

I'm sorry but this trick doesn't work anymore.

My current compiler version is:

rustc 1.71.0-nightly (7908a1d65 2023-04-17)
binary: rustc
commit-hash: 7908a1d65496b88626e4b7c193c81d777005d6f3
commit-date: 2023-04-17
host: x86_64-pc-windows-msvc
release: 1.71.0-nightly
LLVM version: 16.0.2
hellow554 commented 1 year ago

@Tnze for me on the latest nightly (2023-05-07) this compiles in 0.41s and the original code seems to have been fixed as well. It compiles in 0.36s. @nikic should we add E-needs-test for this and close this issue afterwards or do you want to do something else?

Tnze commented 1 year ago

@Tnze for me on the latest nightly (2023-05-07) this compiles in 0.41s and the original code seems to have been fixed as well. It compiles in 0.36s.

@hellow554 I upgrade my nightly toolchain. It doesn't seem to be fixed. Did you forget --release?

hellow554 commented 1 year ago

No, but I did check instead of build. It's not fixed and you're right, I'm sorry. Let me try to bisect this

********************************************************************************
Regression in nightly-2022-02-26
********************************************************************************

fetching https://static.rust-lang.org/dist/2022-02-25/channel-rust-nightly-git-commit-hash.txt
nightly manifest 2022-02-25: 40 B / 40 B [===========================] 100.00 % 808.75 KB/s converted 2022-02-25 to 4b043faba34ccc053a4d0110634c323f6c03765e
fetching https://static.rust-lang.org/dist/2022-02-26/channel-rust-nightly-git-commit-hash.txt
nightly manifest 2022-02-26: 40 B / 40 B [===========================] 100.00 % 775.05 KB/s converted 2022-02-26 to d3ad51b48f83329fac0cd8a9f1253f3146613c1c
looking for regression commit between 2022-02-25 and 2022-02-26
fetching (via remote github) commits from max(4b043faba34ccc053a4d0110634c323f6c03765e, 2022-02-23) to d3ad51b48f83329fac0cd8a9f1253f3146613c1c
ending github query because we found starting sha: 4b043faba34ccc053a4d0110634c323f6c03765e
get_commits_between returning commits, len: 11
  commit[0] 2022-02-24: Auto merge of #94131 - Mark-Simulacrum:fmt-string, r=oli-obk
  commit[1] 2022-02-24: Auto merge of #94333 - Dylan-DPC:rollup-7yxtywp, r=Dylan-DPC
  commit[2] 2022-02-25: Auto merge of #93368 - eddyb:diagbld-guarantee, r=estebank
  commit[3] 2022-02-25: Auto merge of #93878 - Aaron1011:newtype-macro, r=cjgillot
  commit[4] 2022-02-25: Auto merge of #94130 - erikdesjardins:partially, r=nikic
  commit[5] 2022-02-25: Auto merge of #94350 - matthiaskrgr:rollup-eesfiyr, r=matthiaskrgr
  commit[6] 2022-02-25: Auto merge of #93644 - michaelwoerister:simpler-debuginfo-typemap, r=wesleywiser
  commit[7] 2022-02-25: Auto merge of #94357 - matthiaskrgr:rollup-xrjaof3, r=matthiaskrgr
  commit[8] 2022-02-25: Auto merge of #94279 - tmiasko:write-print, r=Mark-Simulacrum
  commit[9] 2022-02-25: Auto merge of #94290 - Mark-Simulacrum:bump-bootstrap, r=pietroalbini
  commit[10] 2022-02-25: Auto merge of #94369 - matthiaskrgr:rollup-qtripm2, r=matthiaskrgr
ERROR: no CI builds available between 4b043faba34ccc053a4d0110634c323f6c03765e and d3ad51b48f83329fac0cd8a9f1253f3146613c1c within last 167 days

4b043faba34ccc053a4d0110634c323f6c03765e...d3ad51b48f83329fac0cd8a9f1253f3146613c1c

dc03 commented 1 year ago

@Tnze Can you please verify that https://github.com/llvm/llvm-project/commit/e13e808283f7fd9e873ae922dd1ef61aeaa0eb4a fixes this issue?

mati865 commented 1 year ago

I'd say we are half way there, on x86_64 Linux:

fn main() {
    let s = [[[[Option::<usize>::None; 10]; 10]; 10]; 10];
    println!("{:?}", s);
}

Current master: 26.82s With backport: 25.07s

fn main() {
    let s = [[[[[Option::<usize>::None; 4]; 9]; 9]; 12]; 5];
    println!("{:?}", s);
}

Current master: 3m 17s With backport: 7.57s

dc03 commented 1 year ago

I'd say we are half way there, on x86_64 Linux:

fn main() {
    let s = [[[[Option::<usize>::None; 10]; 10]; 10]; 10];
    println!("{:?}", s);
}

Current master: 26.82s With backport: 25.07s

fn main() {
    let s = [[[[[Option::<usize>::None; 4]; 9]; 9]; 12]; 5];
    println!("{:?}", s);
}

Current master: 3m 17s With backport: 7.57s

I am quite sure that this is working as intended. The first test case splits the allocas into just enough slices that it doesn't overflow the default limit of 1024 but gets quite close to it, so the compile time explosion still occurs. Whereas in the second one, it generates too many so SROA bails early and the issue doesn't occur.

If you try to lower the limit with the --sroa-max-alloca-slices option to something like 256 or 512, this issue will probably go away.

@nikic Please confirm if I am correct here.