llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.63k stars 11.83k forks source link

Dramatic Compilation slow-down on higher Optimizaitons #14023

Open llvmbot opened 12 years ago

llvmbot commented 12 years ago
Bugzilla Link 13651
Version unspecified
OS All
Attachments File that exhibits problem, Smaller Example File
Reporter LLVM Bugzilla Contributor
CC @d0k,@nico,@stoklund

Extended Description

The attached file contains a few function definitions followed by a main method with a single line of code duplicated 2500 times.

At optimization levels O0 and O1, the code compiles relatively quickly (~5s). However, at optimization levels O2 and O3, the compilation time jumps to over a minute.

d0k commented 12 years ago

r163478 brings this from 40s to 15s at -O2.

d0k commented 12 years ago

Profiling shows that >80% of the time is spent updating LiveVariables when splitting critical edges.

llvmbot commented 12 years ago

Did you compile LLVM with optimization on and assertions off?

I had optimizations on. I tried again after rebuilding while disabling asserts, but there was no difference in performance.

llvmbot commented 12 years ago

Did you compile LLVM with optimization on and assertions off?

llvmbot commented 12 years ago

When PHI-elim takes a lot of time, it is usually because of critical edge splitting. Does llc -disable-phi-elim-edge-splitting go a lot faster?

Yeah, that speeds it up considerably. It then takes around 2 seconds.

Err, wait no sorry. Running llc on the byte code is that fast regardless of the inclusion of -disable-phi-elim-edge-splitting.

llvmbot commented 12 years ago

When PHI-elim takes a lot of time, it is usually because of critical edge splitting. Does llc -disable-phi-elim-edge-splitting go a lot faster?

Yeah, that speeds it up considerably. It then takes around 2 seconds.

1ba3d143-a64b-4671-82b2-0b31cfb91709 commented 12 years ago

When PHI-elim takes a lot of time, it is usually because of critical edge splitting. Does llc -disable-phi-elim-edge-splitting go a lot faster?

nico commented 12 years ago

+Jakob, since a lot of the time is spent in "Eliminate PHI nodes for register allocation".

llvmbot commented 12 years ago

-ftime-report results for -O0 and -O3: At -O0: ===-------------------------------------------------------------------------=== Instruction Selection and Scheduling ===-------------------------------------------------------------------------=== Total Execution Time: 0.5012 seconds (0.5051 wall clock)

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.0936 ( 20.3%) 0.0156 ( 40.2%) 0.1093 ( 21.8%) 0.1091 ( 21.6%) Instruction Creation 0.1024 ( 22.1%) 0.0027 ( 6.9%) 0.1050 ( 21.0%) 0.1049 ( 20.8%) Instruction Scheduling 0.0999 ( 21.6%) 0.0021 ( 5.3%) 0.1020 ( 20.3%) 0.1019 ( 20.2%) Instruction Selection 0.0381 ( 8.2%) 0.0082 ( 21.2%) 0.0463 ( 9.2%) 0.0495 ( 9.8%) DAG Combining 1 0.0348 ( 7.5%) 0.0022 ( 5.7%) 0.0370 ( 7.4%) 0.0369 ( 7.3%) Vector Legalization 0.0335 ( 7.2%) 0.0021 ( 5.3%) 0.0356 ( 7.1%) 0.0368 ( 7.3%) DAG Legalization 0.0261 ( 5.6%) 0.0020 ( 5.2%) 0.0281 ( 5.6%) 0.0281 ( 5.6%) Type Legalization 0.0197 ( 4.3%) 0.0020 ( 5.2%) 0.0218 ( 4.3%) 0.0217 ( 4.3%) DAG Combining 2 0.0142 ( 3.1%) 0.0020 ( 5.1%) 0.0162 ( 3.2%) 0.0162 ( 3.2%) Instruction Scheduling Cleanup 0.4623 (100.0%) 0.0389 (100.0%) 0.5012 (100.0%) 0.5051 (100.0%) Total

===-------------------------------------------------------------------------=== DWARF Emission ===-------------------------------------------------------------------------=== Total Execution Time: 0.0319 seconds (0.0319 wall clock)

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.0266 ( 97.8%) 0.0047 ( 99.6%) 0.0313 ( 98.0%) 0.0313 ( 98.0%) DWARF Exception Writer 0.0006 ( 2.2%) 0.0000 ( 0.4%) 0.0006 ( 2.0%) 0.0006 ( 2.0%) DWARF Debug Writer 0.0272 (100.0%) 0.0047 (100.0%) 0.0319 (100.0%) 0.0319 (100.0%) Total

===-------------------------------------------------------------------------=== ... Pass execution timing report ... ===-------------------------------------------------------------------------=== Total Execution Time: 1.1464 seconds (1.1513 wall clock)

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.7807 ( 73.5%) 0.0690 ( 81.7%) 0.8498 ( 74.1%) 0.8547 ( 74.2%) X86 DAG->DAG Instruction Selection 0.0816 ( 7.7%) 0.0075 ( 8.9%) 0.0891 ( 7.8%) 0.0891 ( 7.7%) X86 AT&T-Style Assembly Printer 0.0404 ( 3.8%) 0.0011 ( 1.3%) 0.0415 ( 3.6%) 0.0414 ( 3.6%) Machine Function Analysis 0.0380 ( 3.6%) 0.0014 ( 1.6%) 0.0394 ( 3.4%) 0.0394 ( 3.4%) Fast Register Allocator 0.0201 ( 1.9%) 0.0004 ( 0.5%) 0.0205 ( 1.8%) 0.0205 ( 1.8%) Prologue/Epilogue Insertion & Frame Finalization 0.0156 ( 1.5%) 0.0027 ( 3.2%) 0.0183 ( 1.6%) 0.0183 ( 1.6%) Dominator Tree Construction 0.0152 ( 1.4%) 0.0010 ( 1.2%) 0.0162 ( 1.4%) 0.0162 ( 1.4%) Dominator Tree Construction 0.0154 ( 1.4%) 0.0007 ( 0.8%) 0.0161 ( 1.4%) 0.0161 ( 1.4%) Dominator Tree Construction 0.0143 ( 1.3%) 0.0000 ( 0.0%) 0.0144 ( 1.3%) 0.0144 ( 1.2%) Two-Address instruction pass 0.0073 ( 0.7%) 0.0000 ( 0.0%) 0.0073 ( 0.6%) 0.0073 ( 0.6%) Module Verifier 0.0072 ( 0.7%) 0.0000 ( 0.0%) 0.0072 ( 0.6%) 0.0072 ( 0.6%) Module Verifier 0.0072 ( 0.7%) 0.0000 ( 0.0%) 0.0072 ( 0.6%) 0.0072 ( 0.6%) Module Verifier 0.0040 ( 0.4%) 0.0000 ( 0.0%) 0.0040 ( 0.3%) 0.0040 ( 0.3%) Post-RA pseudo instruction expansion pass 0.0038 ( 0.4%) 0.0001 ( 0.1%) 0.0039 ( 0.3%) 0.0039 ( 0.3%) Basic CallGraph Construction 0.0025 ( 0.2%) 0.0001 ( 0.1%) 0.0026 ( 0.2%) 0.0026 ( 0.2%) Remove unreachable blocks from the CFG 0.0021 ( 0.2%) 0.0000 ( 0.0%) 0.0022 ( 0.2%) 0.0022 ( 0.2%) Expand ISel Pseudo-instructions 0.0019 ( 0.2%) 0.0000 ( 0.0%) 0.0019 ( 0.2%) 0.0019 ( 0.2%) Inliner for always_inline functions 0.0016 ( 0.1%) 0.0002 ( 0.2%) 0.0018 ( 0.2%) 0.0018 ( 0.2%) Bundle Machine CFG Edges 0.0009 ( 0.1%) 0.0000 ( 0.0%) 0.0009 ( 0.1%) 0.0009 ( 0.1%) Preliminary module verification 0.0005 ( 0.0%) 0.0000 ( 0.0%) 0.0005 ( 0.0%) 0.0005 ( 0.0%) Insert stack protectors 0.0004 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 0.0%) 0.0004 ( 0.0%) Eliminate PHI nodes for register allocation 0.0004 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 0.0%) 0.0004 ( 0.0%) Preliminary module verification 0.0003 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) Exception handling preparation 0.0003 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) Preliminary module verification 0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) X86 Maximal Stack Alignment Check 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) Basic Alias Analysis (stateless AA impl) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) No Alias Analysis (always returns 'may' alias) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Local Stack Slot Allocation 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Lower Garbage Collection Instructions 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Delete Garbage Collector Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) X86 FP Stackifier 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Analyze Machine Code For Garbage Collection 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Create Garbage Collector Module Metadata 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Pass Configuration 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Module Information 1.0619 (100.0%) 0.0844 (100.0%) 1.1464 (100.0%) 1.1513 (100.0%) Total

===-------------------------------------------------------------------------=== Miscellaneous Ungrouped Timers ===-------------------------------------------------------------------------===

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 1.5405 ( 57.0%) 0.1169 ( 55.9%) 1.6575 ( 56.9%) 1.7984 ( 58.0%) Clang front-end timer 1.0853 ( 40.1%) 0.0860 ( 41.1%) 1.1713 ( 40.2%) 1.1927 ( 38.5%) Code Generation Time 0.0776 ( 2.9%) 0.0063 ( 3.0%) 0.0839 ( 2.9%) 0.1071 ( 3.5%) LLVM IR Generation Time 2.7034 (100.0%) 0.2093 (100.0%) 2.9127 (100.0%) 3.0983 (100.0%) Total

At -O3: ===-------------------------------------------------------------------------=== Register Allocation ===-------------------------------------------------------------------------=== Total Execution Time: 0.0059 seconds (0.0059 wall clock)

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.0057 ( 99.9%) 0.0002 ( 97.7%) 0.0059 ( 99.8%) 0.0059 ( 99.8%) Seed Live Regs 0.0000 ( 0.1%) 0.0000 ( 2.3%) 0.0000 ( 0.2%) 0.0000 ( 0.2%) Evict 0.0057 (100.0%) 0.0002 (100.0%) 0.0059 (100.0%) 0.0059 (100.0%) Total

===-------------------------------------------------------------------------=== Instruction Selection and Scheduling ===-------------------------------------------------------------------------=== Total Execution Time: 1.4591 seconds (1.4608 wall clock)

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.2975 ( 21.6%) 0.0069 ( 8.6%) 0.3043 ( 20.9%) 0.3038 ( 20.8%) Instruction Scheduling 0.2495 ( 18.1%) 0.0059 ( 7.4%) 0.2554 ( 17.5%) 0.2548 ( 17.4%) Instruction Selection 0.1795 ( 13.0%) 0.0196 ( 24.5%) 0.1991 ( 13.6%) 0.1985 ( 13.6%) Instruction Creation 0.1305 ( 9.5%) 0.0164 ( 20.5%) 0.1469 ( 10.1%) 0.1509 ( 10.3%) DAG Combining 1 0.1389 ( 10.1%) 0.0055 ( 6.9%) 0.1444 ( 9.9%) 0.1446 ( 9.9%) Type Legalization 0.1344 ( 9.7%) 0.0056 ( 7.0%) 0.1400 ( 9.6%) 0.1395 ( 9.6%) DAG Legalization 0.0816 ( 5.9%) 0.0035 ( 4.4%) 0.0851 ( 5.8%) 0.0848 ( 5.8%) DAG Combining after legalize types 0.0695 ( 5.0%) 0.0054 ( 6.7%) 0.0749 ( 5.1%) 0.0749 ( 5.1%) Vector Legalization 0.0568 ( 4.1%) 0.0056 ( 7.1%) 0.0624 ( 4.3%) 0.0623 ( 4.3%) DAG Combining 2 0.0411 ( 3.0%) 0.0055 ( 6.9%) 0.0465 ( 3.2%) 0.0466 ( 3.2%) Instruction Scheduling Cleanup 1.3793 (100.0%) 0.0799 (100.0%) 1.4591 (100.0%) 1.4608 (100.0%) Total

===-------------------------------------------------------------------------=== DWARF Emission ===-------------------------------------------------------------------------=== Total Execution Time: 0.0252 seconds (0.0252 wall clock)

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.0185 ( 75.6%) 0.0006 ( 96.8%) 0.0191 ( 76.1%) 0.0192 ( 76.1%) DWARF Exception Writer 0.0060 ( 24.4%) 0.0000 ( 3.2%) 0.0060 ( 23.9%) 0.0060 ( 23.9%) DWARF Debug Writer 0.0245 (100.0%) 0.0006 (100.0%) 0.0252 (100.0%) 0.0252 (100.0%) Total

===-------------------------------------------------------------------------=== ... Pass execution timing report ... ===-------------------------------------------------------------------------=== Total Execution Time: 44.8214 seconds (44.8240 wall clock)

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 36.6471 ( 82.3%) 0.0243 ( 7.9%) 36.6714 ( 81.8%) 36.6719 ( 81.8%) Eliminate PHI nodes for register allocation 2.1293 ( 4.8%) 0.1606 ( 52.5%) 2.2899 ( 5.1%) 2.2902 ( 5.1%) X86 DAG->DAG Instruction Selection 1.2961 ( 2.9%) 0.0005 ( 0.2%) 1.2966 ( 2.9%) 1.2966 ( 2.9%) Simple Register Coalescing 0.5791 ( 1.3%) 0.0001 ( 0.0%) 0.5792 ( 1.3%) 0.5792 ( 1.3%) Machine code sinking 0.4765 ( 1.1%) 0.0004 ( 0.1%) 0.4769 ( 1.1%) 0.4770 ( 1.1%) Simplify the CFG 0.4065 ( 0.9%) 0.0146 ( 4.8%) 0.4211 ( 0.9%) 0.4211 ( 0.9%) Function Integration/Inlining 0.2742 ( 0.6%) 0.0020 ( 0.6%) 0.2761 ( 0.6%) 0.2765 ( 0.6%) Greedy Register Allocator 0.2141 ( 0.5%) 0.0048 ( 1.6%) 0.2189 ( 0.5%) 0.2189 ( 0.5%) Combine redundant instructions 0.1784 ( 0.4%) 0.0253 ( 8.3%) 0.2037 ( 0.5%) 0.2037 ( 0.5%) Global Value Numbering 0.1390 ( 0.3%) 0.0006 ( 0.2%) 0.1396 ( 0.3%) 0.1396 ( 0.3%) Branch Probability Basic Block Placement 0.1273 ( 0.3%) 0.0024 ( 0.8%) 0.1297 ( 0.3%) 0.1297 ( 0.3%) Live Variable Analysis 0.1078 ( 0.2%) 0.0015 ( 0.5%) 0.1093 ( 0.2%) 0.1094 ( 0.2%) Combine redundant instructions 0.1076 ( 0.2%) 0.0011 ( 0.4%) 0.1087 ( 0.2%) 0.1087 ( 0.2%) Combine redundant instructions 0.1068 ( 0.2%) 0.0014 ( 0.5%) 0.1082 ( 0.2%) 0.1082 ( 0.2%) Combine redundant instructions 0.0956 ( 0.2%) 0.0009 ( 0.3%) 0.0965 ( 0.2%) 0.0965 ( 0.2%) Machine Common Subexpression Elimination 0.0949 ( 0.2%) 0.0003 ( 0.1%) 0.0952 ( 0.2%) 0.0952 ( 0.2%) Control Flow Optimizer 0.0887 ( 0.2%) 0.0013 ( 0.4%) 0.0900 ( 0.2%) 0.0900 ( 0.2%) X86 AT&T-Style Assembly Printer 0.0670 ( 0.2%) 0.0092 ( 3.0%) 0.0762 ( 0.2%) 0.0762 ( 0.2%) Machine Function Analysis 0.0619 ( 0.1%) 0.0032 ( 1.0%) 0.0651 ( 0.1%) 0.0651 ( 0.1%) Jump Threading 0.0621 ( 0.1%) 0.0007 ( 0.2%) 0.0628 ( 0.1%) 0.0628 ( 0.1%) Early CSE 0.0590 ( 0.1%) 0.0026 ( 0.9%) 0.0616 ( 0.1%) 0.0616 ( 0.1%) Sparse Conditional Constant Propagation 0.0557 ( 0.1%) 0.0023 ( 0.7%) 0.0579 ( 0.1%) 0.0579 ( 0.1%) Value Propagation 0.0527 ( 0.1%) 0.0041 ( 1.3%) 0.0568 ( 0.1%) 0.0567 ( 0.1%) Live Interval Analysis 0.0561 ( 0.1%) 0.0005 ( 0.2%) 0.0566 ( 0.1%) 0.0566 ( 0.1%) Value Propagation 0.0453 ( 0.1%) 0.0000 ( 0.0%) 0.0453 ( 0.1%) 0.0453 ( 0.1%) Optimize for code generation 0.0417 ( 0.1%) 0.0008 ( 0.2%) 0.0425 ( 0.1%) 0.0425 ( 0.1%) Jump Threading 0.0382 ( 0.1%) 0.0000 ( 0.0%) 0.0382 ( 0.1%) 0.0382 ( 0.1%) Module Verifier 0.0381 ( 0.1%) 0.0000 ( 0.0%) 0.0381 ( 0.1%) 0.0381 ( 0.1%) Module Verifier 0.0366 ( 0.1%) 0.0006 ( 0.2%) 0.0372 ( 0.1%) 0.0372 ( 0.1%) Reassociate expressions 0.0354 ( 0.1%) 0.0000 ( 0.0%) 0.0354 ( 0.1%) 0.0354 ( 0.1%) Virtual Register Rewriter 0.0330 ( 0.1%) 0.0009 ( 0.3%) 0.0339 ( 0.1%) 0.0339 ( 0.1%) Two-Address instruction pass 0.0329 ( 0.1%) 0.0008 ( 0.3%) 0.0337 ( 0.1%) 0.0337 ( 0.1%) Prologue/Epilogue Insertion & Frame Finalization 0.0297 ( 0.1%) 0.0008 ( 0.3%) 0.0305 ( 0.1%) 0.0305 ( 0.1%) Aggressive Dead Code Elimination 0.0290 ( 0.1%) 0.0011 ( 0.4%) 0.0301 ( 0.1%) 0.0301 ( 0.1%) Dominator Tree Construction 0.0293 ( 0.1%) 0.0007 ( 0.2%) 0.0300 ( 0.1%) 0.0300 ( 0.1%) Dominator Tree Construction 0.0288 ( 0.1%) 0.0010 ( 0.3%) 0.0299 ( 0.1%) 0.0299 ( 0.1%) Dominator Tree Construction 0.0239 ( 0.1%) 0.0054 ( 1.8%) 0.0294 ( 0.1%) 0.0294 ( 0.1%) Slot index numbering 0.0274 ( 0.1%) 0.0003 ( 0.1%) 0.0278 ( 0.1%) 0.0292 ( 0.1%) Combine redundant instructions 0.0267 ( 0.1%) 0.0023 ( 0.7%) 0.0289 ( 0.1%) 0.0289 ( 0.1%) MachineDominator Tree Construction 0.0268 ( 0.1%) 0.0014 ( 0.5%) 0.0282 ( 0.1%) 0.0282 ( 0.1%) Dominator Tree Construction 0.0171 ( 0.0%) 0.0106 ( 3.5%) 0.0277 ( 0.1%) 0.0277 ( 0.1%) Early CSE 0.0262 ( 0.1%) 0.0008 ( 0.3%) 0.0270 ( 0.1%) 0.0270 ( 0.1%) Dominator Tree Construction 0.0238 ( 0.1%) 0.0000 ( 0.0%) 0.0238 ( 0.1%) 0.0238 ( 0.1%) Calculate spill weights 0.0217 ( 0.0%) 0.0001 ( 0.0%) 0.0218 ( 0.0%) 0.0218 ( 0.0%) Dead Store Elimination 0.0207 ( 0.0%) 0.0006 ( 0.2%) 0.0213 ( 0.0%) 0.0213 ( 0.0%) MachineDominator Tree Construction 0.0210 ( 0.0%) 0.0000 ( 0.0%) 0.0210 ( 0.0%) 0.0210 ( 0.0%) Machine Copy Propagation Pass 0.0200 ( 0.0%) 0.0000 ( 0.0%) 0.0200 ( 0.0%) 0.0200 ( 0.0%) Simplify the CFG 0.0197 ( 0.0%) 0.0001 ( 0.0%) 0.0197 ( 0.0%) 0.0197 ( 0.0%) Simplify the CFG 0.0166 ( 0.0%) 0.0027 ( 0.9%) 0.0193 ( 0.0%) 0.0193 ( 0.0%) Dominator Tree Construction 0.0182 ( 0.0%) 0.0008 ( 0.3%) 0.0191 ( 0.0%) 0.0191 ( 0.0%) Execution dependency fix 0.0178 ( 0.0%) 0.0000 ( 0.0%) 0.0178 ( 0.0%) 0.0178 ( 0.0%) Tail Duplication 0.0170 ( 0.0%) 0.0006 ( 0.2%) 0.0176 ( 0.0%) 0.0176 ( 0.0%) Lazy Value Information Analysis 0.0173 ( 0.0%) 0.0003 ( 0.1%) 0.0176 ( 0.0%) 0.0176 ( 0.0%) Lazy Value Information Analysis 0.0160 ( 0.0%) 0.0000 ( 0.0%) 0.0160 ( 0.0%) 0.0160 ( 0.0%) Remove dead machine instructions 0.0137 ( 0.0%) 0.0003 ( 0.1%) 0.0140 ( 0.0%) 0.0140 ( 0.0%) Simplify the CFG 0.0117 ( 0.0%) 0.0006 ( 0.2%) 0.0124 ( 0.0%) 0.0124 ( 0.0%) Natural Loop Information 0.0115 ( 0.0%) 0.0006 ( 0.2%) 0.0121 ( 0.0%) 0.0121 ( 0.0%) Natural Loop Information 0.0115 ( 0.0%) 0.0003 ( 0.1%) 0.0118 ( 0.0%) 0.0118 ( 0.0%) Interprocedural Sparse Conditional Constant Propagation 0.0110 ( 0.0%) 0.0007 ( 0.2%) 0.0118 ( 0.0%) 0.0118 ( 0.0%) Machine Block Frequency Analysis 0.0107 ( 0.0%) 0.0006 ( 0.2%) 0.0113 ( 0.0%) 0.0113 ( 0.0%) X86 FP Stackifier 0.0107 ( 0.0%) 0.0006 ( 0.2%) 0.0113 ( 0.0%) 0.0113 ( 0.0%) Branch Probability Analysis 0.0106 ( 0.0%) 0.0007 ( 0.2%) 0.0112 ( 0.0%) 0.0112 ( 0.0%) Natural Loop Information 0.0108 ( 0.0%) 0.0004 ( 0.1%) 0.0112 ( 0.0%) 0.0112 ( 0.0%) Scalar Replacement of Aggregates (DT) 0.0097 ( 0.0%) 0.0000 ( 0.0%) 0.0097 ( 0.0%) 0.0097 ( 0.0%) MemCpy Optimization 0.0090 ( 0.0%) 0.0000 ( 0.0%) 0.0090 ( 0.0%) 0.0090 ( 0.0%) Dead Global Elimination 0.0086 ( 0.0%) 0.0001 ( 0.0%) 0.0086 ( 0.0%) 0.0086 ( 0.0%) Scalar Replacement of Aggregates (SSAUp) 0.0082 ( 0.0%) 0.0004 ( 0.1%) 0.0086 ( 0.0%) 0.0086 ( 0.0%) Dominator Tree Construction 0.0081 ( 0.0%) 0.0000 ( 0.0%) 0.0081 ( 0.0%) 0.0081 ( 0.0%) Module Verifier 0.0068 ( 0.0%) 0.0004 ( 0.1%) 0.0072 ( 0.0%) 0.0072 ( 0.0%) Post-RA pseudo instruction expansion pass 0.0067 ( 0.0%) 0.0005 ( 0.2%) 0.0072 ( 0.0%) 0.0072 ( 0.0%) Machine Natural Loop Construction 0.0071 ( 0.0%) 0.0000 ( 0.0%) 0.0071 ( 0.0%) 0.0071 ( 0.0%) Peephole Optimizations 0.0061 ( 0.0%) 0.0000 ( 0.0%) 0.0061 ( 0.0%) 0.0061 ( 0.0%) Simplify well-known library calls 0.0055 ( 0.0%) 0.0005 ( 0.2%) 0.0060 ( 0.0%) 0.0060 ( 0.0%) Machine Natural Loop Construction 0.0057 ( 0.0%) 0.0002 ( 0.1%) 0.0059 ( 0.0%) 0.0059 ( 0.0%) Remove unreachable machine basic blocks 0.0054 ( 0.0%) 0.0000 ( 0.0%) 0.0054 ( 0.0%) 0.0054 ( 0.0%) Remove unreachable blocks from the CFG 0.0049 ( 0.0%) 0.0000 ( 0.0%) 0.0049 ( 0.0%) 0.0049 ( 0.0%) Insert stack protectors 0.0048 ( 0.0%) 0.0000 ( 0.0%) 0.0049 ( 0.0%) 0.0049 ( 0.0%) Tail Call Elimination 0.0042 ( 0.0%) 0.0000 ( 0.0%) 0.0042 ( 0.0%) 0.0042 ( 0.0%) Tail Duplication 0.0040 ( 0.0%) 0.0000 ( 0.0%) 0.0040 ( 0.0%) 0.0040 ( 0.0%) Debug Variable Analysis 0.0040 ( 0.0%) 0.0001 ( 0.0%) 0.0040 ( 0.0%) 0.0040 ( 0.0%) Basic CallGraph Construction 0.0029 ( 0.0%) 0.0001 ( 0.0%) 0.0030 ( 0.0%) 0.0030 ( 0.0%) Bundle Machine CFG Edges 0.0029 ( 0.0%) 0.0000 ( 0.0%) 0.0030 ( 0.0%) 0.0029 ( 0.0%) Simplify the CFG 0.0029 ( 0.0%) 0.0000 ( 0.0%) 0.0029 ( 0.0%) 0.0029 ( 0.0%) Bundle Machine CFG Edges 0.0020 ( 0.0%) 0.0000 ( 0.0%) 0.0020 ( 0.0%) 0.0020 ( 0.0%) Expand ISel Pseudo-instructions 0.0019 ( 0.0%) 0.0000 ( 0.0%) 0.0019 ( 0.0%) 0.0019 ( 0.0%) Process Implicit Definitions 0.0018 ( 0.0%) 0.0000 ( 0.0%) 0.0019 ( 0.0%) 0.0019 ( 0.0%) Remove unused exception handling info 0.0016 ( 0.0%) 0.0000 ( 0.0%) 0.0016 ( 0.0%) 0.0016 ( 0.0%) Preliminary module verification 0.0013 ( 0.0%) 0.0000 ( 0.0%) 0.0013 ( 0.0%) 0.0013 ( 0.0%) Memory Dependence Analysis 0.0011 ( 0.0%) 0.0001 ( 0.0%) 0.0012 ( 0.0%) 0.0012 ( 0.0%) Deduce function attributes 0.0011 ( 0.0%) 0.0000 ( 0.0%) 0.0011 ( 0.0%) 0.0012 ( 0.0%) Preliminary module verification 0.0010 ( 0.0%) 0.0000 ( 0.0%) 0.0010 ( 0.0%) 0.0010 ( 0.0%) Optimize machine instruction PHIs 0.0009 ( 0.0%) 0.0000 ( 0.0%) 0.0009 ( 0.0%) 0.0009 ( 0.0%) Preliminary module verification 0.0008 ( 0.0%) 0.0000 ( 0.0%) 0.0008 ( 0.0%) 0.0008 ( 0.0%) Spill Code Placement Analysis 0.0006 ( 0.0%) 0.0000 ( 0.0%) 0.0007 ( 0.0%) 0.0007 ( 0.0%) Global Variable Optimizer 0.0005 ( 0.0%) 0.0000 ( 0.0%) 0.0005 ( 0.0%) 0.0005 ( 0.0%) Lower 'expect' Intrinsics 0.0005 ( 0.0%) 0.0000 ( 0.0%) 0.0005 ( 0.0%) 0.0005 ( 0.0%) Exception handling preparation 0.0001 ( 0.0%) 0.0003 ( 0.1%) 0.0005 ( 0.0%) 0.0005 ( 0.0%) Virtual Register Map 0.0001 ( 0.0%) 0.0002 ( 0.1%) 0.0003 ( 0.0%) 0.0004 ( 0.0%) No Alias Analysis (always returns 'may' alias) 0.0003 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) Live Register Matrix 0.0002 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Dead Argument Elimination 0.0002 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) X86 Maximal Stack Alignment Check 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) Memory Dependence Analysis 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) Memory Dependence Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) Basic Alias Analysis (stateless AA impl) 0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) Scalar Evolution Analysis 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) Basic Alias Analysis (stateless AA impl) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) Promote 'by reference' arguments to scalars 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) No Alias Analysis (always returns 'may' alias) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) No Alias Analysis (always returns 'may' alias) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Loop Invariant Code Motion 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Post RA top-down list latency scheduler 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Scalar Evolution Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Live Stack Slot Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Strip Unused Function Prototypes 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Loop Invariant Code Motion 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Stack Slot Coloring 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Local Stack Slot Allocation 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Analyze Machine Code For Garbage Collection 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Delete Garbage Collector Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Lower Garbage Collection Instructions 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Live Stack Slot Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Merge Duplicate Global Constants 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Pass Configuration 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Create Garbage Collector Module Metadata 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Module Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Branch Probability Analysis 44.5154 (100.0%) 0.3060 (100.0%) 44.8214 (100.0%) 44.8240 (100.0%) Total

===-------------------------------------------------------------------------=== Miscellaneous Ungrouped Timers ===-------------------------------------------------------------------------===

---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 45.0637 ( 50.2%) 0.3487 ( 52.4%) 45.4124 ( 50.2%) 45.4247 ( 50.2%) Clang front-end timer 44.5841 ( 49.7%) 0.3102 ( 46.6%) 44.8943 ( 49.7%) 44.8969 ( 49.7%) Code Generation Time 0.0806 ( 0.1%) 0.0061 ( 0.9%) 0.0867 ( 0.1%) 0.0869 ( 0.1%) LLVM IR Generation Time 89.7284 (100.0%) 0.6650 (100.0%) 90.3934 (100.0%) 90.4086 (100.0%) Total

llvmbot commented 12 years ago

gcc version of this bug: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54337

clang is slow on the gcc test file, but gcc is fast on the test file attached to this bug.