Open llvmbot opened 6 years ago
Aside: why is -ftime-report on clang so useless? None of the +5s running time is accounted for by any of the timers.
It's intended for investigating optimization problems - in part because it's a bit easier to separate different optimization passes & measure them. The time spent parsing/sema/etc is a bit harder to assess (since the whole parsing process goes through all the layers somewhat more simultaneously - so there's less descrete actions that can be timed in isolation)
A generic profiler (gprof, etc) would probably be more useful for this sort of issue.
Aside: why is -ftime-report
on clang so useless? None of the +5s running time is accounted for by any of the timers.
SLOW TIME REPORT (-ftime-report
):
=================================
===-------------------------------------------------------------------------===
Miscellaneous Ungrouped Timers
===-------------------------------------------------------------------------===
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.0034 ( 24.9%) 0.0046 ( 50.8%) 0.0080 ( 35.3%) 0.0301 ( 53.6%) Code Generation Time
0.0102 ( 75.1%) 0.0045 ( 49.2%) 0.0146 ( 64.7%) 0.0261 ( 46.4%) LLVM IR Generation Time
0.0136 (100.0%) 0.0091 (100.0%) 0.0226 (100.0%) 0.0561 (100.0%) Total
===-------------------------------------------------------------------------===
DWARF Emission
===-------------------------------------------------------------------------===
Total Execution Time: 0.0001 seconds (0.0004 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.0000 ( 51.4%) 0.0001 ( 71.4%) 0.0001 ( 65.2%) 0.0003 ( 70.1%) Debug Info Emission
0.0000 ( 11.4%) 0.0000 ( 24.7%) 0.0000 ( 20.5%) 0.0001 ( 26.3%) DWARF Debug Writer
0.0000 ( 37.1%) 0.0000 ( 3.9%) 0.0000 ( 14.3%) 0.0000 ( 3.6%) DWARF Exception Writer
0.0000 (100.0%) 0.0001 (100.0%) 0.0001 (100.0%) 0.0004 (100.0%) Total
===-------------------------------------------------------------------------===
... Pass execution timing report ...
===-------------------------------------------------------------------------===
Total Execution Time: 0.0042 seconds (0.0143 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.0007 ( 34.5%) 0.0007 ( 29.2%) 0.0013 ( 31.6%) 0.0043 ( 29.7%) Expand Atomic instructions
0.0002 ( 10.6%) 0.0006 ( 25.6%) 0.0008 ( 18.8%) 0.0036 ( 24.9%) X86 DAG->DAG Instruction Selection
0.0001 ( 6.4%) 0.0002 ( 9.4%) 0.0003 ( 8.1%) 0.0015 ( 10.5%) X86 Assembly Printer
0.0001 ( 3.1%) 0.0001 ( 5.7%) 0.0002 ( 4.5%) 0.0008 ( 5.4%) Prologue/Epilogue Insertion & Frame Finalization
0.0005 ( 24.0%) 0.0001 ( 3.3%) 0.0005 ( 12.8%) 0.0005 ( 3.8%) X86 Retpoline Thunks
0.0000 ( 2.0%) 0.0001 ( 2.9%) 0.0001 ( 2.5%) 0.0004 ( 3.0%) Inliner for always_inline functions
0.0000 ( 1.0%) 0.0001 ( 2.7%) 0.0001 ( 1.9%) 0.0004 ( 2.8%) MachineDominator Tree Construction
0.0000 ( 2.1%) 0.0001 ( 2.4%) 0.0001 ( 2.3%) 0.0003 ( 2.3%) Fast Register Allocator
0.0000 ( 0.3%) 0.0000 ( 1.7%) 0.0000 ( 1.0%) 0.0003 ( 2.0%) Scalarize Masked Memory Intrinsics
0.0000 ( 1.7%) 0.0000 ( 2.0%) 0.0001 ( 1.9%) 0.0003 ( 2.0%) Insert stack protectors
0.0000 ( 0.6%) 0.0001 ( 2.2%) 0.0001 ( 1.5%) 0.0003 ( 1.9%) Post-RA pseudo instruction expansion pass
0.0000 ( 0.8%) 0.0000 ( 1.8%) 0.0001 ( 1.3%) 0.0003 ( 1.9%) Dominator Tree Construction
0.0000 ( 1.2%) 0.0000 ( 1.7%) 0.0001 ( 1.5%) 0.0003 ( 1.9%) Machine Natural Loop Construction
0.0000 ( 1.3%) 0.0000 ( 1.3%) 0.0001 ( 1.3%) 0.0002 ( 1.5%) CallGraph Construction
0.0000 ( 1.1%) 0.0000 ( 1.4%) 0.0001 ( 1.3%) 0.0002 ( 1.2%) Two-Address instruction pass
0.0000 ( 0.7%) 0.0000 ( 1.0%) 0.0000 ( 0.9%) 0.0002 ( 1.1%) Free MachineFunction
0.0000 ( 0.3%) 0.0000 ( 0.8%) 0.0000 ( 0.6%) 0.0001 ( 1.0%) Eliminate PHI nodes for register allocation
0.0000 ( 0.3%) 0.0000 ( 0.9%) 0.0000 ( 0.6%) 0.0001 ( 0.9%) Expand reduction intrinsics
0.0000 ( 0.3%) 0.0000 ( 1.0%) 0.0000 ( 0.6%) 0.0001 ( 0.7%) Local Stack Slot Allocation
0.0000 ( 0.6%) 0.0000 ( 0.0%) 0.0000 ( 0.3%) 0.0000 ( 0.1%) MachineDominator Tree Construction
0.0000 ( 0.4%) 0.0000 ( 0.1%) 0.0000 ( 0.3%) 0.0000 ( 0.1%) Machine Natural Loop Construction
0.0000 ( 0.4%) 0.0000 ( 0.1%) 0.0000 ( 0.2%) 0.0000 ( 0.1%) Dominator Tree Construction
0.0000 ( 0.3%) 0.0000 ( 0.0%) 0.0000 ( 0.2%) 0.0000 ( 0.1%) Basic Alias Analysis (stateless AA impl)
0.0000 ( 0.2%) 0.0000 ( 0.3%) 0.0000 ( 0.3%) 0.0000 ( 0.1%) Lazy Machine Block Frequency Analysis
0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.2%) 0.0000 ( 0.1%) X86 WinAlloca Expander
0.0000 ( 0.3%) 0.0000 ( 0.1%) 0.0000 ( 0.2%) 0.0000 ( 0.1%) Exception handling preparation
0.0000 ( 0.3%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) Expand indirectbr instructions
0.0000 ( 0.2%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) Machine Optimization Remark Emitter
0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) Contiguously Lay Out Funclets
0.0000 ( 0.3%) 0.0000 ( 0.1%) 0.0000 ( 0.2%) 0.0000 ( 0.0%) Analyze Machine Code For Garbage Collection
0.0000 ( 0.3%) 0.0000 ( 0.1%) 0.0000 ( 0.2%) 0.0000 ( 0.0%) Remove unreachable blocks from the CFG
0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.2%) 0.0000 ( 0.0%) Instrument function entry/exit with calls to e.g. mcount() (pre inlining)
0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.2%) 0.0000 ( 0.0%) Bundle Machine CFG Edges
0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) Live DEBUG_VALUE analysis
0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) Insert fentry calls
0.0000 ( 0.3%) 0.0000 ( 0.1%) 0.0000 ( 0.2%) 0.0000 ( 0.0%) X86 pseudo instruction expansion pass
0.0000 ( 0.2%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) Machine Optimization Remark Emitter
0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) X86 PIC Global Base Reg Initialization
0.0000 ( 0.2%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) Safe Stack instrumentation pass
0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) Shadow Stack GC Lowering
0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) X86 FP Stackifier
0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) Instrument function entry/exit with calls to e.g. mcount() (post inlining)
0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) X86 vzeroupper inserter
0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) X86 Indirect Branch Tracking
0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) Implement the 'patchable-function' attribute
0.0000 ( 0.2%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) Expand ISel Pseudo-instructions
0.0000 ( 0.2%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) Lower Garbage Collection Instructions
0.0000 ( 0.3%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) Lazy Machine Block Frequency Analysis
0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) StackMap Liveness Analysis
0.0000 ( 0.3%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) Insert XRay ops
0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) Pre-ISel Intrinsic Lowering
0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) Rewrite Symbols
0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Assumption Cache Tracker
0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information
0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Branch Probability Analysis
0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Pass Configuration
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Transform Information
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information
0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Profile summary info
0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Force set function attributes
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Module Information
0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Assumption Cache Tracker
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Create Garbage Collector Module Metadata
0.0019 (100.0%) 0.0023 (100.0%) 0.0042 (100.0%) 0.0143 (100.0%) Total
===-------------------------------------------------------------------------===
Clang front-end time report
===-------------------------------------------------------------------------===
Total Execution Time: 5.6984 seconds (5.7773 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
5.5208 (100.0%) 0.1776 (100.0%) 5.6984 (100.0%) 5.7773 (100.0%) Clang front-end timer
5.5208 (100.0%) 0.1776 (100.0%) 5.6984 (100.0%) 5.7773 (100.0%) Total
FAST TIME REPORT (-ftime-report):
=================================
===-------------------------------------------------------------------------===
Miscellaneous Ungrouped Timers
===-------------------------------------------------------------------------===
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.0088 ( 76.3%) 0.0016 ( 65.0%) 0.0105 ( 74.3%) 0.0108 ( 74.8%) LLVM IR Generation Time
0.0027 ( 23.7%) 0.0009 ( 35.0%) 0.0036 ( 25.7%) 0.0036 ( 25.2%) Code Generation Time
0.0116 (100.0%) 0.0025 (100.0%) 0.0141 (100.0%) 0.0144 (100.0%) Total
===-------------------------------------------------------------------------===
DWARF Emission
===-------------------------------------------------------------------------===
Total Execution Time: 0.0000 seconds (0.0000 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.0000 ( 57.1%) 0.0000 ( 57.1%) 0.0000 ( 57.1%) 0.0000 ( 54.3%) Debug Info Emission
0.0000 ( 32.1%) 0.0000 ( 28.6%) 0.0000 ( 31.0%) 0.0000 ( 33.7%) DWARF Exception Writer
0.0000 ( 10.7%) 0.0000 ( 14.3%) 0.0000 ( 11.9%) 0.0000 ( 12.0%) DWARF Debug Writer
0.0000 (100.0%) 0.0000 (100.0%) 0.0000 (100.0%) 0.0000 (100.0%) Total
===-------------------------------------------------------------------------===
... Pass execution timing report ...
===-------------------------------------------------------------------------===
Total Execution Time: 0.0022 seconds (0.0022 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.0006 ( 35.2%) 0.0002 ( 27.6%) 0.0007 ( 33.3%) 0.0007 ( 33.5%) Expand Atomic instructions
0.0005 ( 27.1%) 0.0001 ( 14.8%) 0.0005 ( 23.9%) 0.0005 ( 24.0%) X86 Retpoline Thunks
0.0002 ( 9.3%) 0.0001 ( 19.7%) 0.0003 ( 11.9%) 0.0003 ( 12.0%) X86 DAG->DAG Instruction Selection
0.0001 ( 5.8%) 0.0000 ( 7.6%) 0.0001 ( 6.3%) 0.0001 ( 6.4%) X86 Assembly Printer
0.0000 ( 2.9%) 0.0000 ( 3.2%) 0.0001 ( 3.0%) 0.0001 ( 3.1%) Prologue/Epilogue Insertion & Frame Finalization
0.0000 ( 1.6%) 0.0000 ( 1.1%) 0.0000 ( 1.4%) 0.0000 ( 1.4%) Fast Register Allocator
0.0000 ( 1.3%) 0.0000 ( 1.6%) 0.0000 ( 1.3%) 0.0000 ( 1.3%) Insert stack protectors
0.0000 ( 1.1%) 0.0000 ( 2.1%) 0.0000 ( 1.4%) 0.0000 ( 1.2%) Two-Address instruction pass
0.0000 ( 1.0%) 0.0000 ( 1.4%) 0.0000 ( 1.1%) 0.0000 ( 1.2%) MachineDominator Tree Construction
0.0000 ( 0.8%) 0.0000 ( 1.8%) 0.0000 ( 1.1%) 0.0000 ( 1.1%) Inliner for always_inline functions
0.0000 ( 0.8%) 0.0000 ( 1.4%) 0.0000 ( 1.0%) 0.0000 ( 0.9%) Dominator Tree Construction
0.0000 ( 0.6%) 0.0000 ( 0.9%) 0.0000 ( 0.7%) 0.0000 ( 0.8%) Machine Natural Loop Construction
0.0000 ( 0.7%) 0.0000 ( 1.1%) 0.0000 ( 0.8%) 0.0000 ( 0.8%) Post-RA pseudo instruction expansion pass
0.0000 ( 0.7%) 0.0000 ( 0.9%) 0.0000 ( 0.7%) 0.0000 ( 0.7%) Free MachineFunction
0.0000 ( 0.5%) 0.0000 ( 0.7%) 0.0000 ( 0.5%) 0.0000 ( 0.5%) CallGraph Construction
0.0000 ( 0.5%) 0.0000 ( 0.5%) 0.0000 ( 0.5%) 0.0000 ( 0.5%) MachineDominator Tree Construction
0.0000 ( 0.4%) 0.0000 ( 0.7%) 0.0000 ( 0.5%) 0.0000 ( 0.4%) Expand reduction intrinsics
0.0000 ( 0.4%) 0.0000 ( 0.7%) 0.0000 ( 0.4%) 0.0000 ( 0.4%) Eliminate PHI nodes for register allocation
0.0000 ( 0.3%) 0.0000 ( 0.9%) 0.0000 ( 0.4%) 0.0000 ( 0.4%) Local Stack Slot Allocation
0.0000 ( 0.4%) 0.0000 ( 0.2%) 0.0000 ( 0.4%) 0.0000 ( 0.4%) Exception handling preparation
0.0000 ( 0.3%) 0.0000 ( 0.7%) 0.0000 ( 0.4%) 0.0000 ( 0.4%) Scalarize Masked Memory Intrinsics
0.0000 ( 0.4%) 0.0000 ( 0.4%) 0.0000 ( 0.4%) 0.0000 ( 0.4%) Basic Alias Analysis (stateless AA impl)
0.0000 ( 0.4%) 0.0000 ( 0.2%) 0.0000 ( 0.4%) 0.0000 ( 0.4%) Machine Natural Loop Construction
0.0000 ( 0.4%) 0.0000 ( 0.4%) 0.0000 ( 0.4%) 0.0000 ( 0.4%) Dominator Tree Construction
0.0000 ( 0.2%) 0.0000 ( 0.4%) 0.0000 ( 0.3%) 0.0000 ( 0.3%) Insert XRay ops
0.0000 ( 0.2%) 0.0000 ( 0.4%) 0.0000 ( 0.2%) 0.0000 ( 0.3%) Expand indirectbr instructions
0.0000 ( 0.3%) 0.0000 ( 0.4%) 0.0000 ( 0.3%) 0.0000 ( 0.3%) Instrument function entry/exit with calls to e.g. mcount() (pre inlining)
0.0000 ( 0.2%) 0.0000 ( 0.4%) 0.0000 ( 0.3%) 0.0000 ( 0.3%) X86 WinAlloca Expander
0.0000 ( 0.2%) 0.0000 ( 0.4%) 0.0000 ( 0.3%) 0.0000 ( 0.3%) StackMap Liveness Analysis
0.0000 ( 0.3%) 0.0000 ( 0.0%) 0.0000 ( 0.2%) 0.0000 ( 0.3%) Instrument function entry/exit with calls to e.g. mcount() (post inlining)
0.0000 ( 0.2%) 0.0000 ( 0.4%) 0.0000 ( 0.2%) 0.0000 ( 0.3%) Insert fentry calls
0.0000 ( 0.2%) 0.0000 ( 0.2%) 0.0000 ( 0.2%) 0.0000 ( 0.3%) Bundle Machine CFG Edges
0.0000 ( 0.3%) 0.0000 ( 0.5%) 0.0000 ( 0.4%) 0.0000 ( 0.3%) Shadow Stack GC Lowering
0.0000 ( 0.2%) 0.0000 ( 0.4%) 0.0000 ( 0.2%) 0.0000 ( 0.3%) Machine Optimization Remark Emitter
0.0000 ( 0.2%) 0.0000 ( 0.5%) 0.0000 ( 0.3%) 0.0000 ( 0.3%) Analyze Machine Code For Garbage Collection
0.0000 ( 0.4%) 0.0000 ( 0.5%) 0.0000 ( 0.4%) 0.0000 ( 0.3%) Contiguously Lay Out Funclets
0.0000 ( 0.2%) 0.0000 ( 0.4%) 0.0000 ( 0.2%) 0.0000 ( 0.3%) Safe Stack instrumentation pass
0.0000 ( 0.2%) 0.0000 ( 0.4%) 0.0000 ( 0.3%) 0.0000 ( 0.3%) X86 Indirect Branch Tracking
0.0000 ( 0.2%) 0.0000 ( 0.2%) 0.0000 ( 0.2%) 0.0000 ( 0.3%) X86 PIC Global Base Reg Initialization
0.0000 ( 0.3%) 0.0000 ( 0.4%) 0.0000 ( 0.3%) 0.0000 ( 0.2%) Machine Optimization Remark Emitter
0.0000 ( 0.2%) 0.0000 ( 0.5%) 0.0000 ( 0.3%) 0.0000 ( 0.2%) Implement the 'patchable-function' attribute
0.0000 ( 0.2%) 0.0000 ( 0.7%) 0.0000 ( 0.3%) 0.0000 ( 0.2%) Lazy Machine Block Frequency Analysis
0.0000 ( 0.2%) 0.0000 ( 0.2%) 0.0000 ( 0.2%) 0.0000 ( 0.2%) X86 FP Stackifier
0.0000 ( 0.2%) 0.0000 ( 0.2%) 0.0000 ( 0.2%) 0.0000 ( 0.2%) Live DEBUG_VALUE analysis
0.0000 ( 0.2%) 0.0000 ( 0.2%) 0.0000 ( 0.2%) 0.0000 ( 0.2%) X86 vzeroupper inserter
0.0000 ( 0.2%) 0.0000 ( 0.4%) 0.0000 ( 0.3%) 0.0000 ( 0.2%) Expand ISel Pseudo-instructions
0.0000 ( 0.4%) 0.0000 ( 0.2%) 0.0000 ( 0.3%) 0.0000 ( 0.2%) Lazy Machine Block Frequency Analysis
0.0000 ( 0.4%) 0.0000 ( 0.5%) 0.0000 ( 0.4%) 0.0000 ( 0.2%) Remove unreachable blocks from the CFG
0.0000 ( 0.1%) 0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.2%) Rewrite Symbols
0.0000 ( 0.2%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) 0.0000 ( 0.2%) Lower Garbage Collection Instructions
0.0000 ( 0.3%) 0.0000 ( 0.4%) 0.0000 ( 0.3%) 0.0000 ( 0.2%) X86 pseudo instruction expansion pass
0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.1%) Assumption Cache Tracker
0.0000 ( 0.1%) 0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.1%) Pre-ISel Intrinsic Lowering
0.0000 ( 0.1%) 0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) Target Pass Configuration
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information
0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Transform Information
0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Assumption Cache Tracker
0.0000 ( 0.1%) 0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) Profile summary info
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information
0.0000 ( 0.1%) 0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 ( 0.0%) Force set function attributes
0.0000 ( 0.0%) 0.0000 ( 0.2%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Branch Probability Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Module Information
0.0000 ( 0.1%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Create Garbage Collector Module Metadata
0.0017 (100.0%) 0.0006 (100.0%) 0.0022 (100.0%) 0.0022 (100.0%) Total
===-------------------------------------------------------------------------===
Clang front-end time report
===-------------------------------------------------------------------------===
Total Execution Time: 0.5674 seconds (0.5681 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.5402 (100.0%) 0.0272 (100.0%) 0.5674 (100.0%) 0.5681 (100.0%) Clang front-end timer
0.5402 (100.0%) 0.0272 (100.0%) 0.5674 (100.0%) 0.5681 (100.0%) Total
I don't. I could get one from the attached example, but I'm away from my computer right now.
Some slowdown from adding an extra template parameter would be expected, but 10x or even 60% seems unreasonable. Do you have any profiling data (or even just a few backtraces from the slow run) to help identify what's going slowly?
Extended Description
Please find attached an example where compile times are 10x slower when a SFINAE condition (
std::enable_if
) is located in a template parameter list, as opposed to located in the return type.Slow code:
Fast code:
REPRO: (FAST)
(SLOW)
On my machine, compiling with
SFINAE_FAST
defined takes 0.58s, where as without the define, it takes 5.8s.By comparison, gcc is also slower for SFINAE-in-template-parameter-list, but "only" about 60%.
CONTEXT: The popular range-v3 library makes extensive use of the SFINAE-in-template-parameter-list to emulate concepts. This perf bug is probably severely effecting compile times.