Open llvmbot opened 12 years ago
r163478 brings this from 40s to 15s at -O2.
Profiling shows that >80% of the time is spent updating LiveVariables when splitting critical edges.
Did you compile LLVM with optimization on and assertions off?
I had optimizations on. I tried again after rebuilding while disabling asserts, but there was no difference in performance.
Did you compile LLVM with optimization on and assertions off?
When PHI-elim takes a lot of time, it is usually because of critical edge splitting. Does llc -disable-phi-elim-edge-splitting go a lot faster?
Yeah, that speeds it up considerably. It then takes around 2 seconds.
Err, wait no sorry. Running llc on the byte code is that fast regardless of the inclusion of -disable-phi-elim-edge-splitting.
When PHI-elim takes a lot of time, it is usually because of critical edge splitting. Does llc -disable-phi-elim-edge-splitting go a lot faster?
Yeah, that speeds it up considerably. It then takes around 2 seconds.
When PHI-elim takes a lot of time, it is usually because of critical edge splitting. Does llc -disable-phi-elim-edge-splitting go a lot faster?
+Jakob, since a lot of the time is spent in "Eliminate PHI nodes for register allocation".
-ftime-report results for -O0 and -O3: At -O0: ===-------------------------------------------------------------------------=== Instruction Selection and Scheduling ===-------------------------------------------------------------------------=== Total Execution Time: 0.5012 seconds (0.5051 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.0936 ( 20.3%) 0.0156 ( 40.2%) 0.1093 ( 21.8%) 0.1091 ( 21.6%) Instruction Creation 0.1024 ( 22.1%) 0.0027 ( 6.9%) 0.1050 ( 21.0%) 0.1049 ( 20.8%) Instruction Scheduling 0.0999 ( 21.6%) 0.0021 ( 5.3%) 0.1020 ( 20.3%) 0.1019 ( 20.2%) Instruction Selection 0.0381 ( 8.2%) 0.0082 ( 21.2%) 0.0463 ( 9.2%) 0.0495 ( 9.8%) DAG Combining 1 0.0348 ( 7.5%) 0.0022 ( 5.7%) 0.0370 ( 7.4%) 0.0369 ( 7.3%) Vector Legalization 0.0335 ( 7.2%) 0.0021 ( 5.3%) 0.0356 ( 7.1%) 0.0368 ( 7.3%) DAG Legalization 0.0261 ( 5.6%) 0.0020 ( 5.2%) 0.0281 ( 5.6%) 0.0281 ( 5.6%) Type Legalization 0.0197 ( 4.3%) 0.0020 ( 5.2%) 0.0218 ( 4.3%) 0.0217 ( 4.3%) DAG Combining 2 0.0142 ( 3.1%) 0.0020 ( 5.1%) 0.0162 ( 3.2%) 0.0162 ( 3.2%) Instruction Scheduling Cleanup 0.4623 (100.0%) 0.0389 (100.0%) 0.5012 (100.0%) 0.5051 (100.0%) Total
===-------------------------------------------------------------------------=== DWARF Emission ===-------------------------------------------------------------------------=== Total Execution Time: 0.0319 seconds (0.0319 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.0266 ( 97.8%) 0.0047 ( 99.6%) 0.0313 ( 98.0%) 0.0313 ( 98.0%) DWARF Exception Writer 0.0006 ( 2.2%) 0.0000 ( 0.4%) 0.0006 ( 2.0%) 0.0006 ( 2.0%) DWARF Debug Writer 0.0272 (100.0%) 0.0047 (100.0%) 0.0319 (100.0%) 0.0319 (100.0%) Total
===-------------------------------------------------------------------------=== ... Pass execution timing report ... ===-------------------------------------------------------------------------=== Total Execution Time: 1.1464 seconds (1.1513 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.7807 ( 73.5%) 0.0690 ( 81.7%) 0.8498 ( 74.1%) 0.8547 ( 74.2%) X86 DAG->DAG Instruction Selection 0.0816 ( 7.7%) 0.0075 ( 8.9%) 0.0891 ( 7.8%) 0.0891 ( 7.7%) X86 AT&T-Style Assembly Printer 0.0404 ( 3.8%) 0.0011 ( 1.3%) 0.0415 ( 3.6%) 0.0414 ( 3.6%) Machine Function Analysis 0.0380 ( 3.6%) 0.0014 ( 1.6%) 0.0394 ( 3.4%) 0.0394 ( 3.4%) Fast Register Allocator 0.0201 ( 1.9%) 0.0004 ( 0.5%) 0.0205 ( 1.8%) 0.0205 ( 1.8%) Prologue/Epilogue Insertion & Frame Finalization 0.0156 ( 1.5%) 0.0027 ( 3.2%) 0.0183 ( 1.6%) 0.0183 ( 1.6%) Dominator Tree Construction 0.0152 ( 1.4%) 0.0010 ( 1.2%) 0.0162 ( 1.4%) 0.0162 ( 1.4%) Dominator Tree Construction 0.0154 ( 1.4%) 0.0007 ( 0.8%) 0.0161 ( 1.4%) 0.0161 ( 1.4%) Dominator Tree Construction 0.0143 ( 1.3%) 0.0000 ( 0.0%) 0.0144 ( 1.3%) 0.0144 ( 1.2%) Two-Address instruction pass 0.0073 ( 0.7%) 0.0000 ( 0.0%) 0.0073 ( 0.6%) 0.0073 ( 0.6%) Module Verifier 0.0072 ( 0.7%) 0.0000 ( 0.0%) 0.0072 ( 0.6%) 0.0072 ( 0.6%) Module Verifier 0.0072 ( 0.7%) 0.0000 ( 0.0%) 0.0072 ( 0.6%) 0.0072 ( 0.6%) Module Verifier 0.0040 ( 0.4%) 0.0000 ( 0.0%) 0.0040 ( 0.3%) 0.0040 ( 0.3%) Post-RA pseudo instruction expansion pass 0.0038 ( 0.4%) 0.0001 ( 0.1%) 0.0039 ( 0.3%) 0.0039 ( 0.3%) Basic CallGraph Construction 0.0025 ( 0.2%) 0.0001 ( 0.1%) 0.0026 ( 0.2%) 0.0026 ( 0.2%) Remove unreachable blocks from the CFG 0.0021 ( 0.2%) 0.0000 ( 0.0%) 0.0022 ( 0.2%) 0.0022 ( 0.2%) Expand ISel Pseudo-instructions 0.0019 ( 0.2%) 0.0000 ( 0.0%) 0.0019 ( 0.2%) 0.0019 ( 0.2%) Inliner for always_inline functions 0.0016 ( 0.1%) 0.0002 ( 0.2%) 0.0018 ( 0.2%) 0.0018 ( 0.2%) Bundle Machine CFG Edges 0.0009 ( 0.1%) 0.0000 ( 0.0%) 0.0009 ( 0.1%) 0.0009 ( 0.1%) Preliminary module verification 0.0005 ( 0.0%) 0.0000 ( 0.0%) 0.0005 ( 0.0%) 0.0005 ( 0.0%) Insert stack protectors 0.0004 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 0.0%) 0.0004 ( 0.0%) Eliminate PHI nodes for register allocation 0.0004 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 0.0%) 0.0004 ( 0.0%) Preliminary module verification 0.0003 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) Exception handling preparation 0.0003 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) Preliminary module verification 0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) X86 Maximal Stack Alignment Check 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) Basic Alias Analysis (stateless AA impl) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) No Alias Analysis (always returns 'may' alias) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Local Stack Slot Allocation 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Lower Garbage Collection Instructions 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Delete Garbage Collector Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) X86 FP Stackifier 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Analyze Machine Code For Garbage Collection 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Create Garbage Collector Module Metadata 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Pass Configuration 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Module Information 1.0619 (100.0%) 0.0844 (100.0%) 1.1464 (100.0%) 1.1513 (100.0%) Total
===-------------------------------------------------------------------------=== Miscellaneous Ungrouped Timers ===-------------------------------------------------------------------------===
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 1.5405 ( 57.0%) 0.1169 ( 55.9%) 1.6575 ( 56.9%) 1.7984 ( 58.0%) Clang front-end timer 1.0853 ( 40.1%) 0.0860 ( 41.1%) 1.1713 ( 40.2%) 1.1927 ( 38.5%) Code Generation Time 0.0776 ( 2.9%) 0.0063 ( 3.0%) 0.0839 ( 2.9%) 0.1071 ( 3.5%) LLVM IR Generation Time 2.7034 (100.0%) 0.2093 (100.0%) 2.9127 (100.0%) 3.0983 (100.0%) Total
At -O3: ===-------------------------------------------------------------------------=== Register Allocation ===-------------------------------------------------------------------------=== Total Execution Time: 0.0059 seconds (0.0059 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.0057 ( 99.9%) 0.0002 ( 97.7%) 0.0059 ( 99.8%) 0.0059 ( 99.8%) Seed Live Regs 0.0000 ( 0.1%) 0.0000 ( 2.3%) 0.0000 ( 0.2%) 0.0000 ( 0.2%) Evict 0.0057 (100.0%) 0.0002 (100.0%) 0.0059 (100.0%) 0.0059 (100.0%) Total
===-------------------------------------------------------------------------=== Instruction Selection and Scheduling ===-------------------------------------------------------------------------=== Total Execution Time: 1.4591 seconds (1.4608 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.2975 ( 21.6%) 0.0069 ( 8.6%) 0.3043 ( 20.9%) 0.3038 ( 20.8%) Instruction Scheduling 0.2495 ( 18.1%) 0.0059 ( 7.4%) 0.2554 ( 17.5%) 0.2548 ( 17.4%) Instruction Selection 0.1795 ( 13.0%) 0.0196 ( 24.5%) 0.1991 ( 13.6%) 0.1985 ( 13.6%) Instruction Creation 0.1305 ( 9.5%) 0.0164 ( 20.5%) 0.1469 ( 10.1%) 0.1509 ( 10.3%) DAG Combining 1 0.1389 ( 10.1%) 0.0055 ( 6.9%) 0.1444 ( 9.9%) 0.1446 ( 9.9%) Type Legalization 0.1344 ( 9.7%) 0.0056 ( 7.0%) 0.1400 ( 9.6%) 0.1395 ( 9.6%) DAG Legalization 0.0816 ( 5.9%) 0.0035 ( 4.4%) 0.0851 ( 5.8%) 0.0848 ( 5.8%) DAG Combining after legalize types 0.0695 ( 5.0%) 0.0054 ( 6.7%) 0.0749 ( 5.1%) 0.0749 ( 5.1%) Vector Legalization 0.0568 ( 4.1%) 0.0056 ( 7.1%) 0.0624 ( 4.3%) 0.0623 ( 4.3%) DAG Combining 2 0.0411 ( 3.0%) 0.0055 ( 6.9%) 0.0465 ( 3.2%) 0.0466 ( 3.2%) Instruction Scheduling Cleanup 1.3793 (100.0%) 0.0799 (100.0%) 1.4591 (100.0%) 1.4608 (100.0%) Total
===-------------------------------------------------------------------------=== DWARF Emission ===-------------------------------------------------------------------------=== Total Execution Time: 0.0252 seconds (0.0252 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.0185 ( 75.6%) 0.0006 ( 96.8%) 0.0191 ( 76.1%) 0.0192 ( 76.1%) DWARF Exception Writer 0.0060 ( 24.4%) 0.0000 ( 3.2%) 0.0060 ( 23.9%) 0.0060 ( 23.9%) DWARF Debug Writer 0.0245 (100.0%) 0.0006 (100.0%) 0.0252 (100.0%) 0.0252 (100.0%) Total
===-------------------------------------------------------------------------=== ... Pass execution timing report ... ===-------------------------------------------------------------------------=== Total Execution Time: 44.8214 seconds (44.8240 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 36.6471 ( 82.3%) 0.0243 ( 7.9%) 36.6714 ( 81.8%) 36.6719 ( 81.8%) Eliminate PHI nodes for register allocation 2.1293 ( 4.8%) 0.1606 ( 52.5%) 2.2899 ( 5.1%) 2.2902 ( 5.1%) X86 DAG->DAG Instruction Selection 1.2961 ( 2.9%) 0.0005 ( 0.2%) 1.2966 ( 2.9%) 1.2966 ( 2.9%) Simple Register Coalescing 0.5791 ( 1.3%) 0.0001 ( 0.0%) 0.5792 ( 1.3%) 0.5792 ( 1.3%) Machine code sinking 0.4765 ( 1.1%) 0.0004 ( 0.1%) 0.4769 ( 1.1%) 0.4770 ( 1.1%) Simplify the CFG 0.4065 ( 0.9%) 0.0146 ( 4.8%) 0.4211 ( 0.9%) 0.4211 ( 0.9%) Function Integration/Inlining 0.2742 ( 0.6%) 0.0020 ( 0.6%) 0.2761 ( 0.6%) 0.2765 ( 0.6%) Greedy Register Allocator 0.2141 ( 0.5%) 0.0048 ( 1.6%) 0.2189 ( 0.5%) 0.2189 ( 0.5%) Combine redundant instructions 0.1784 ( 0.4%) 0.0253 ( 8.3%) 0.2037 ( 0.5%) 0.2037 ( 0.5%) Global Value Numbering 0.1390 ( 0.3%) 0.0006 ( 0.2%) 0.1396 ( 0.3%) 0.1396 ( 0.3%) Branch Probability Basic Block Placement 0.1273 ( 0.3%) 0.0024 ( 0.8%) 0.1297 ( 0.3%) 0.1297 ( 0.3%) Live Variable Analysis 0.1078 ( 0.2%) 0.0015 ( 0.5%) 0.1093 ( 0.2%) 0.1094 ( 0.2%) Combine redundant instructions 0.1076 ( 0.2%) 0.0011 ( 0.4%) 0.1087 ( 0.2%) 0.1087 ( 0.2%) Combine redundant instructions 0.1068 ( 0.2%) 0.0014 ( 0.5%) 0.1082 ( 0.2%) 0.1082 ( 0.2%) Combine redundant instructions 0.0956 ( 0.2%) 0.0009 ( 0.3%) 0.0965 ( 0.2%) 0.0965 ( 0.2%) Machine Common Subexpression Elimination 0.0949 ( 0.2%) 0.0003 ( 0.1%) 0.0952 ( 0.2%) 0.0952 ( 0.2%) Control Flow Optimizer 0.0887 ( 0.2%) 0.0013 ( 0.4%) 0.0900 ( 0.2%) 0.0900 ( 0.2%) X86 AT&T-Style Assembly Printer 0.0670 ( 0.2%) 0.0092 ( 3.0%) 0.0762 ( 0.2%) 0.0762 ( 0.2%) Machine Function Analysis 0.0619 ( 0.1%) 0.0032 ( 1.0%) 0.0651 ( 0.1%) 0.0651 ( 0.1%) Jump Threading 0.0621 ( 0.1%) 0.0007 ( 0.2%) 0.0628 ( 0.1%) 0.0628 ( 0.1%) Early CSE 0.0590 ( 0.1%) 0.0026 ( 0.9%) 0.0616 ( 0.1%) 0.0616 ( 0.1%) Sparse Conditional Constant Propagation 0.0557 ( 0.1%) 0.0023 ( 0.7%) 0.0579 ( 0.1%) 0.0579 ( 0.1%) Value Propagation 0.0527 ( 0.1%) 0.0041 ( 1.3%) 0.0568 ( 0.1%) 0.0567 ( 0.1%) Live Interval Analysis 0.0561 ( 0.1%) 0.0005 ( 0.2%) 0.0566 ( 0.1%) 0.0566 ( 0.1%) Value Propagation 0.0453 ( 0.1%) 0.0000 ( 0.0%) 0.0453 ( 0.1%) 0.0453 ( 0.1%) Optimize for code generation 0.0417 ( 0.1%) 0.0008 ( 0.2%) 0.0425 ( 0.1%) 0.0425 ( 0.1%) Jump Threading 0.0382 ( 0.1%) 0.0000 ( 0.0%) 0.0382 ( 0.1%) 0.0382 ( 0.1%) Module Verifier 0.0381 ( 0.1%) 0.0000 ( 0.0%) 0.0381 ( 0.1%) 0.0381 ( 0.1%) Module Verifier 0.0366 ( 0.1%) 0.0006 ( 0.2%) 0.0372 ( 0.1%) 0.0372 ( 0.1%) Reassociate expressions 0.0354 ( 0.1%) 0.0000 ( 0.0%) 0.0354 ( 0.1%) 0.0354 ( 0.1%) Virtual Register Rewriter 0.0330 ( 0.1%) 0.0009 ( 0.3%) 0.0339 ( 0.1%) 0.0339 ( 0.1%) Two-Address instruction pass 0.0329 ( 0.1%) 0.0008 ( 0.3%) 0.0337 ( 0.1%) 0.0337 ( 0.1%) Prologue/Epilogue Insertion & Frame Finalization 0.0297 ( 0.1%) 0.0008 ( 0.3%) 0.0305 ( 0.1%) 0.0305 ( 0.1%) Aggressive Dead Code Elimination 0.0290 ( 0.1%) 0.0011 ( 0.4%) 0.0301 ( 0.1%) 0.0301 ( 0.1%) Dominator Tree Construction 0.0293 ( 0.1%) 0.0007 ( 0.2%) 0.0300 ( 0.1%) 0.0300 ( 0.1%) Dominator Tree Construction 0.0288 ( 0.1%) 0.0010 ( 0.3%) 0.0299 ( 0.1%) 0.0299 ( 0.1%) Dominator Tree Construction 0.0239 ( 0.1%) 0.0054 ( 1.8%) 0.0294 ( 0.1%) 0.0294 ( 0.1%) Slot index numbering 0.0274 ( 0.1%) 0.0003 ( 0.1%) 0.0278 ( 0.1%) 0.0292 ( 0.1%) Combine redundant instructions 0.0267 ( 0.1%) 0.0023 ( 0.7%) 0.0289 ( 0.1%) 0.0289 ( 0.1%) MachineDominator Tree Construction 0.0268 ( 0.1%) 0.0014 ( 0.5%) 0.0282 ( 0.1%) 0.0282 ( 0.1%) Dominator Tree Construction 0.0171 ( 0.0%) 0.0106 ( 3.5%) 0.0277 ( 0.1%) 0.0277 ( 0.1%) Early CSE 0.0262 ( 0.1%) 0.0008 ( 0.3%) 0.0270 ( 0.1%) 0.0270 ( 0.1%) Dominator Tree Construction 0.0238 ( 0.1%) 0.0000 ( 0.0%) 0.0238 ( 0.1%) 0.0238 ( 0.1%) Calculate spill weights 0.0217 ( 0.0%) 0.0001 ( 0.0%) 0.0218 ( 0.0%) 0.0218 ( 0.0%) Dead Store Elimination 0.0207 ( 0.0%) 0.0006 ( 0.2%) 0.0213 ( 0.0%) 0.0213 ( 0.0%) MachineDominator Tree Construction 0.0210 ( 0.0%) 0.0000 ( 0.0%) 0.0210 ( 0.0%) 0.0210 ( 0.0%) Machine Copy Propagation Pass 0.0200 ( 0.0%) 0.0000 ( 0.0%) 0.0200 ( 0.0%) 0.0200 ( 0.0%) Simplify the CFG 0.0197 ( 0.0%) 0.0001 ( 0.0%) 0.0197 ( 0.0%) 0.0197 ( 0.0%) Simplify the CFG 0.0166 ( 0.0%) 0.0027 ( 0.9%) 0.0193 ( 0.0%) 0.0193 ( 0.0%) Dominator Tree Construction 0.0182 ( 0.0%) 0.0008 ( 0.3%) 0.0191 ( 0.0%) 0.0191 ( 0.0%) Execution dependency fix 0.0178 ( 0.0%) 0.0000 ( 0.0%) 0.0178 ( 0.0%) 0.0178 ( 0.0%) Tail Duplication 0.0170 ( 0.0%) 0.0006 ( 0.2%) 0.0176 ( 0.0%) 0.0176 ( 0.0%) Lazy Value Information Analysis 0.0173 ( 0.0%) 0.0003 ( 0.1%) 0.0176 ( 0.0%) 0.0176 ( 0.0%) Lazy Value Information Analysis 0.0160 ( 0.0%) 0.0000 ( 0.0%) 0.0160 ( 0.0%) 0.0160 ( 0.0%) Remove dead machine instructions 0.0137 ( 0.0%) 0.0003 ( 0.1%) 0.0140 ( 0.0%) 0.0140 ( 0.0%) Simplify the CFG 0.0117 ( 0.0%) 0.0006 ( 0.2%) 0.0124 ( 0.0%) 0.0124 ( 0.0%) Natural Loop Information 0.0115 ( 0.0%) 0.0006 ( 0.2%) 0.0121 ( 0.0%) 0.0121 ( 0.0%) Natural Loop Information 0.0115 ( 0.0%) 0.0003 ( 0.1%) 0.0118 ( 0.0%) 0.0118 ( 0.0%) Interprocedural Sparse Conditional Constant Propagation 0.0110 ( 0.0%) 0.0007 ( 0.2%) 0.0118 ( 0.0%) 0.0118 ( 0.0%) Machine Block Frequency Analysis 0.0107 ( 0.0%) 0.0006 ( 0.2%) 0.0113 ( 0.0%) 0.0113 ( 0.0%) X86 FP Stackifier 0.0107 ( 0.0%) 0.0006 ( 0.2%) 0.0113 ( 0.0%) 0.0113 ( 0.0%) Branch Probability Analysis 0.0106 ( 0.0%) 0.0007 ( 0.2%) 0.0112 ( 0.0%) 0.0112 ( 0.0%) Natural Loop Information 0.0108 ( 0.0%) 0.0004 ( 0.1%) 0.0112 ( 0.0%) 0.0112 ( 0.0%) Scalar Replacement of Aggregates (DT) 0.0097 ( 0.0%) 0.0000 ( 0.0%) 0.0097 ( 0.0%) 0.0097 ( 0.0%) MemCpy Optimization 0.0090 ( 0.0%) 0.0000 ( 0.0%) 0.0090 ( 0.0%) 0.0090 ( 0.0%) Dead Global Elimination 0.0086 ( 0.0%) 0.0001 ( 0.0%) 0.0086 ( 0.0%) 0.0086 ( 0.0%) Scalar Replacement of Aggregates (SSAUp) 0.0082 ( 0.0%) 0.0004 ( 0.1%) 0.0086 ( 0.0%) 0.0086 ( 0.0%) Dominator Tree Construction 0.0081 ( 0.0%) 0.0000 ( 0.0%) 0.0081 ( 0.0%) 0.0081 ( 0.0%) Module Verifier 0.0068 ( 0.0%) 0.0004 ( 0.1%) 0.0072 ( 0.0%) 0.0072 ( 0.0%) Post-RA pseudo instruction expansion pass 0.0067 ( 0.0%) 0.0005 ( 0.2%) 0.0072 ( 0.0%) 0.0072 ( 0.0%) Machine Natural Loop Construction 0.0071 ( 0.0%) 0.0000 ( 0.0%) 0.0071 ( 0.0%) 0.0071 ( 0.0%) Peephole Optimizations 0.0061 ( 0.0%) 0.0000 ( 0.0%) 0.0061 ( 0.0%) 0.0061 ( 0.0%) Simplify well-known library calls 0.0055 ( 0.0%) 0.0005 ( 0.2%) 0.0060 ( 0.0%) 0.0060 ( 0.0%) Machine Natural Loop Construction 0.0057 ( 0.0%) 0.0002 ( 0.1%) 0.0059 ( 0.0%) 0.0059 ( 0.0%) Remove unreachable machine basic blocks 0.0054 ( 0.0%) 0.0000 ( 0.0%) 0.0054 ( 0.0%) 0.0054 ( 0.0%) Remove unreachable blocks from the CFG 0.0049 ( 0.0%) 0.0000 ( 0.0%) 0.0049 ( 0.0%) 0.0049 ( 0.0%) Insert stack protectors 0.0048 ( 0.0%) 0.0000 ( 0.0%) 0.0049 ( 0.0%) 0.0049 ( 0.0%) Tail Call Elimination 0.0042 ( 0.0%) 0.0000 ( 0.0%) 0.0042 ( 0.0%) 0.0042 ( 0.0%) Tail Duplication 0.0040 ( 0.0%) 0.0000 ( 0.0%) 0.0040 ( 0.0%) 0.0040 ( 0.0%) Debug Variable Analysis 0.0040 ( 0.0%) 0.0001 ( 0.0%) 0.0040 ( 0.0%) 0.0040 ( 0.0%) Basic CallGraph Construction 0.0029 ( 0.0%) 0.0001 ( 0.0%) 0.0030 ( 0.0%) 0.0030 ( 0.0%) Bundle Machine CFG Edges 0.0029 ( 0.0%) 0.0000 ( 0.0%) 0.0030 ( 0.0%) 0.0029 ( 0.0%) Simplify the CFG 0.0029 ( 0.0%) 0.0000 ( 0.0%) 0.0029 ( 0.0%) 0.0029 ( 0.0%) Bundle Machine CFG Edges 0.0020 ( 0.0%) 0.0000 ( 0.0%) 0.0020 ( 0.0%) 0.0020 ( 0.0%) Expand ISel Pseudo-instructions 0.0019 ( 0.0%) 0.0000 ( 0.0%) 0.0019 ( 0.0%) 0.0019 ( 0.0%) Process Implicit Definitions 0.0018 ( 0.0%) 0.0000 ( 0.0%) 0.0019 ( 0.0%) 0.0019 ( 0.0%) Remove unused exception handling info 0.0016 ( 0.0%) 0.0000 ( 0.0%) 0.0016 ( 0.0%) 0.0016 ( 0.0%) Preliminary module verification 0.0013 ( 0.0%) 0.0000 ( 0.0%) 0.0013 ( 0.0%) 0.0013 ( 0.0%) Memory Dependence Analysis 0.0011 ( 0.0%) 0.0001 ( 0.0%) 0.0012 ( 0.0%) 0.0012 ( 0.0%) Deduce function attributes 0.0011 ( 0.0%) 0.0000 ( 0.0%) 0.0011 ( 0.0%) 0.0012 ( 0.0%) Preliminary module verification 0.0010 ( 0.0%) 0.0000 ( 0.0%) 0.0010 ( 0.0%) 0.0010 ( 0.0%) Optimize machine instruction PHIs 0.0009 ( 0.0%) 0.0000 ( 0.0%) 0.0009 ( 0.0%) 0.0009 ( 0.0%) Preliminary module verification 0.0008 ( 0.0%) 0.0000 ( 0.0%) 0.0008 ( 0.0%) 0.0008 ( 0.0%) Spill Code Placement Analysis 0.0006 ( 0.0%) 0.0000 ( 0.0%) 0.0007 ( 0.0%) 0.0007 ( 0.0%) Global Variable Optimizer 0.0005 ( 0.0%) 0.0000 ( 0.0%) 0.0005 ( 0.0%) 0.0005 ( 0.0%) Lower 'expect' Intrinsics 0.0005 ( 0.0%) 0.0000 ( 0.0%) 0.0005 ( 0.0%) 0.0005 ( 0.0%) Exception handling preparation 0.0001 ( 0.0%) 0.0003 ( 0.1%) 0.0005 ( 0.0%) 0.0005 ( 0.0%) Virtual Register Map 0.0001 ( 0.0%) 0.0002 ( 0.1%) 0.0003 ( 0.0%) 0.0004 ( 0.0%) No Alias Analysis (always returns 'may' alias) 0.0003 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) 0.0003 ( 0.0%) Live Register Matrix 0.0002 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) Dead Argument Elimination 0.0002 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) 0.0002 ( 0.0%) X86 Maximal Stack Alignment Check 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) Memory Dependence Analysis 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) Memory Dependence Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) Basic Alias Analysis (stateless AA impl) 0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) Scalar Evolution Analysis 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) Basic Alias Analysis (stateless AA impl) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) Promote 'by reference' arguments to scalars 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) No Alias Analysis (always returns 'may' alias) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) No Alias Analysis (always returns 'may' alias) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Loop Invariant Code Motion 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Post RA top-down list latency scheduler 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Scalar Evolution Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Live Stack Slot Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Strip Unused Function Prototypes 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Loop Invariant Code Motion 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Stack Slot Coloring 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Local Stack Slot Allocation 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Analyze Machine Code For Garbage Collection 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Delete Garbage Collector Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Lower Garbage Collection Instructions 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Live Stack Slot Analysis 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Merge Duplicate Global Constants 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Pass Configuration 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Create Garbage Collector Module Metadata 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Module Information 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Branch Probability Analysis 44.5154 (100.0%) 0.3060 (100.0%) 44.8214 (100.0%) 44.8240 (100.0%) Total
===-------------------------------------------------------------------------=== Miscellaneous Ungrouped Timers ===-------------------------------------------------------------------------===
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 45.0637 ( 50.2%) 0.3487 ( 52.4%) 45.4124 ( 50.2%) 45.4247 ( 50.2%) Clang front-end timer 44.5841 ( 49.7%) 0.3102 ( 46.6%) 44.8943 ( 49.7%) 44.8969 ( 49.7%) Code Generation Time 0.0806 ( 0.1%) 0.0061 ( 0.9%) 0.0867 ( 0.1%) 0.0869 ( 0.1%) LLVM IR Generation Time 89.7284 (100.0%) 0.6650 (100.0%) 90.3934 (100.0%) 90.4086 (100.0%) Total
gcc version of this bug: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54337
clang is slow on the gcc test file, but gcc is fast on the test file attached to this bug.
Extended Description
The attached file contains a few function definitions followed by a main method with a single line of code duplicated 2500 times.
At optimization levels O0 and O1, the code compiles relatively quickly (~5s). However, at optimization levels O2 and O3, the compilation time jumps to over a minute.