llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.36k stars 12.14k forks source link

SFINAE in template parameter list is 10x slower than SFINAE in return type #35506

Open llvmbot opened 6 years ago

llvmbot commented 6 years ago
Bugzilla Link 36158
Version trunk
OS All
Attachments demonstrate severe compile-time degradation of SFINAE-in-tparam-list
Reporter LLVM Bugzilla Contributor
CC @apolukhin,@ts826848,@CaseyCarter,@dwblaikie,@DougGregor,@Ivan171,@JohelEGP,@rbock,@riccibruno,@zygoloid,@yuanfang-chen

Extended Description

Please find attached an example where compile times are 10x slower when a SFINAE condition (std::enable_if) is located in a template parameter list, as opposed to located in the return type.

Slow code:

template<typename That,
    typename std::enable_if<(bool)That(), int>::type = 0>
constexpr friend dummy operator&&(dummy, bool_<That>) noexcept
{
    return {};
}

Fast code:

template<typename That>
constexpr friend
typename std::enable_if<(bool)That(), dummy>::type
operator&&(dummy, bool_<That>) noexcept
{
    return {};
}

REPRO: (FAST)

clang++ -std=gnu++11 -DSFINAE_FAST -c sfinae-perf-bug.cpp -o /dev/null

(SLOW)

clang++ -std=gnu++11 -c sfinae-perf-bug.cpp -o /dev/null

On my machine, compiling with SFINAE_FAST defined takes 0.58s, where as without the define, it takes 5.8s.

By comparison, gcc is also slower for SFINAE-in-template-parameter-list, but "only" about 60%.

CONTEXT: The popular range-v3 library makes extensive use of the SFINAE-in-template-parameter-list to emulate concepts. This perf bug is probably severely effecting compile times.

dwblaikie commented 6 years ago

Aside: why is -ftime-report on clang so useless? None of the +5s running time is accounted for by any of the timers.

It's intended for investigating optimization problems - in part because it's a bit easier to separate different optimization passes & measure them. The time spent parsing/sema/etc is a bit harder to assess (since the whole parsing process goes through all the layers somewhat more simultaneously - so there's less descrete actions that can be timed in isolation)

A generic profiler (gprof, etc) would probably be more useful for this sort of issue.

llvmbot commented 6 years ago

Aside: why is -ftime-report on clang so useless? None of the +5s running time is accounted for by any of the timers.

llvmbot commented 6 years ago

SLOW TIME REPORT (-ftime-report):

=================================

===-------------------------------------------------------------------------===
                         Miscellaneous Ungrouped Timers
===-------------------------------------------------------------------------===

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0034 ( 24.9%)   0.0046 ( 50.8%)   0.0080 ( 35.3%)   0.0301 ( 53.6%)  Code Generation Time
   0.0102 ( 75.1%)   0.0045 ( 49.2%)   0.0146 ( 64.7%)   0.0261 ( 46.4%)  LLVM IR Generation Time
   0.0136 (100.0%)   0.0091 (100.0%)   0.0226 (100.0%)   0.0561 (100.0%)  Total

===-------------------------------------------------------------------------===
                                 DWARF Emission
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0001 seconds (0.0004 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0000 ( 51.4%)   0.0001 ( 71.4%)   0.0001 ( 65.2%)   0.0003 ( 70.1%)  Debug Info Emission
   0.0000 ( 11.4%)   0.0000 ( 24.7%)   0.0000 ( 20.5%)   0.0001 ( 26.3%)  DWARF Debug Writer
   0.0000 ( 37.1%)   0.0000 (  3.9%)   0.0000 ( 14.3%)   0.0000 (  3.6%)  DWARF Exception Writer
   0.0000 (100.0%)   0.0001 (100.0%)   0.0001 (100.0%)   0.0004 (100.0%)  Total

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0042 seconds (0.0143 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0007 ( 34.5%)   0.0007 ( 29.2%)   0.0013 ( 31.6%)   0.0043 ( 29.7%)  Expand Atomic instructions
   0.0002 ( 10.6%)   0.0006 ( 25.6%)   0.0008 ( 18.8%)   0.0036 ( 24.9%)  X86 DAG->DAG Instruction Selection
   0.0001 (  6.4%)   0.0002 (  9.4%)   0.0003 (  8.1%)   0.0015 ( 10.5%)  X86 Assembly Printer
   0.0001 (  3.1%)   0.0001 (  5.7%)   0.0002 (  4.5%)   0.0008 (  5.4%)  Prologue/Epilogue Insertion & Frame Finalization
   0.0005 ( 24.0%)   0.0001 (  3.3%)   0.0005 ( 12.8%)   0.0005 (  3.8%)  X86 Retpoline Thunks
   0.0000 (  2.0%)   0.0001 (  2.9%)   0.0001 (  2.5%)   0.0004 (  3.0%)  Inliner for always_inline functions
   0.0000 (  1.0%)   0.0001 (  2.7%)   0.0001 (  1.9%)   0.0004 (  2.8%)  MachineDominator Tree Construction
   0.0000 (  2.1%)   0.0001 (  2.4%)   0.0001 (  2.3%)   0.0003 (  2.3%)  Fast Register Allocator
   0.0000 (  0.3%)   0.0000 (  1.7%)   0.0000 (  1.0%)   0.0003 (  2.0%)  Scalarize Masked Memory Intrinsics
   0.0000 (  1.7%)   0.0000 (  2.0%)   0.0001 (  1.9%)   0.0003 (  2.0%)  Insert stack protectors
   0.0000 (  0.6%)   0.0001 (  2.2%)   0.0001 (  1.5%)   0.0003 (  1.9%)  Post-RA pseudo instruction expansion pass
   0.0000 (  0.8%)   0.0000 (  1.8%)   0.0001 (  1.3%)   0.0003 (  1.9%)  Dominator Tree Construction
   0.0000 (  1.2%)   0.0000 (  1.7%)   0.0001 (  1.5%)   0.0003 (  1.9%)  Machine Natural Loop Construction
   0.0000 (  1.3%)   0.0000 (  1.3%)   0.0001 (  1.3%)   0.0002 (  1.5%)  CallGraph Construction
   0.0000 (  1.1%)   0.0000 (  1.4%)   0.0001 (  1.3%)   0.0002 (  1.2%)  Two-Address instruction pass
   0.0000 (  0.7%)   0.0000 (  1.0%)   0.0000 (  0.9%)   0.0002 (  1.1%)  Free MachineFunction
   0.0000 (  0.3%)   0.0000 (  0.8%)   0.0000 (  0.6%)   0.0001 (  1.0%)  Eliminate PHI nodes for register allocation
   0.0000 (  0.3%)   0.0000 (  0.9%)   0.0000 (  0.6%)   0.0001 (  0.9%)  Expand reduction intrinsics
   0.0000 (  0.3%)   0.0000 (  1.0%)   0.0000 (  0.6%)   0.0001 (  0.7%)  Local Stack Slot Allocation
   0.0000 (  0.6%)   0.0000 (  0.0%)   0.0000 (  0.3%)   0.0000 (  0.1%)  MachineDominator Tree Construction
   0.0000 (  0.4%)   0.0000 (  0.1%)   0.0000 (  0.3%)   0.0000 (  0.1%)  Machine Natural Loop Construction
   0.0000 (  0.4%)   0.0000 (  0.1%)   0.0000 (  0.2%)   0.0000 (  0.1%)  Dominator Tree Construction
   0.0000 (  0.3%)   0.0000 (  0.0%)   0.0000 (  0.2%)   0.0000 (  0.1%)  Basic Alias Analysis (stateless AA impl)
   0.0000 (  0.2%)   0.0000 (  0.3%)   0.0000 (  0.3%)   0.0000 (  0.1%)  Lazy Machine Block Frequency Analysis
   0.0000 (  0.2%)   0.0000 (  0.1%)   0.0000 (  0.2%)   0.0000 (  0.1%)  X86 WinAlloca Expander
   0.0000 (  0.3%)   0.0000 (  0.1%)   0.0000 (  0.2%)   0.0000 (  0.1%)  Exception handling preparation
   0.0000 (  0.3%)   0.0000 (  0.0%)   0.0000 (  0.1%)   0.0000 (  0.1%)  Expand indirectbr instructions
   0.0000 (  0.2%)   0.0000 (  0.0%)   0.0000 (  0.1%)   0.0000 (  0.1%)  Machine Optimization Remark Emitter
   0.0000 (  0.2%)   0.0000 (  0.1%)   0.0000 (  0.1%)   0.0000 (  0.1%)  Contiguously Lay Out Funclets
   0.0000 (  0.3%)   0.0000 (  0.1%)   0.0000 (  0.2%)   0.0000 (  0.0%)  Analyze Machine Code For Garbage Collection
   0.0000 (  0.3%)   0.0000 (  0.1%)   0.0000 (  0.2%)   0.0000 (  0.0%)  Remove unreachable blocks from the CFG
   0.0000 (  0.2%)   0.0000 (  0.1%)   0.0000 (  0.2%)   0.0000 (  0.0%)  Instrument function entry/exit with calls to e.g. mcount() (pre inlining)
   0.0000 (  0.2%)   0.0000 (  0.1%)   0.0000 (  0.2%)   0.0000 (  0.0%)  Bundle Machine CFG Edges
   0.0000 (  0.2%)   0.0000 (  0.1%)   0.0000 (  0.1%)   0.0000 (  0.0%)  Live DEBUG_VALUE analysis
   0.0000 (  0.2%)   0.0000 (  0.1%)   0.0000 (  0.1%)   0.0000 (  0.0%)  Insert fentry calls
   0.0000 (  0.3%)   0.0000 (  0.1%)   0.0000 (  0.2%)   0.0000 (  0.0%)  X86 pseudo instruction expansion pass
   0.0000 (  0.2%)   0.0000 (  0.0%)   0.0000 (  0.1%)   0.0000 (  0.0%)  Machine Optimization Remark Emitter
   0.0000 (  0.1%)   0.0000 (  0.1%)   0.0000 (  0.1%)   0.0000 (  0.0%)  X86 PIC Global Base Reg Initialization
   0.0000 (  0.2%)   0.0000 (  0.0%)   0.0000 (  0.1%)   0.0000 (  0.0%)  Safe Stack instrumentation pass
   0.0000 (  0.1%)   0.0000 (  0.1%)   0.0000 (  0.1%)   0.0000 (  0.0%)  Shadow Stack GC Lowering
   0.0000 (  0.1%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  X86 FP Stackifier
   0.0000 (  0.2%)   0.0000 (  0.1%)   0.0000 (  0.1%)   0.0000 (  0.0%)  Instrument function entry/exit with calls to e.g. mcount() (post inlining)
   0.0000 (  0.2%)   0.0000 (  0.1%)   0.0000 (  0.1%)   0.0000 (  0.0%)  X86 vzeroupper inserter
   0.0000 (  0.2%)   0.0000 (  0.1%)   0.0000 (  0.1%)   0.0000 (  0.0%)  X86 Indirect Branch Tracking
   0.0000 (  0.2%)   0.0000 (  0.1%)   0.0000 (  0.1%)   0.0000 (  0.0%)  Implement the 'patchable-function' attribute
   0.0000 (  0.2%)   0.0000 (  0.0%)   0.0000 (  0.1%)   0.0000 (  0.0%)  Expand ISel Pseudo-instructions
   0.0000 (  0.2%)   0.0000 (  0.0%)   0.0000 (  0.1%)   0.0000 (  0.0%)  Lower Garbage Collection Instructions
   0.0000 (  0.3%)   0.0000 (  0.0%)   0.0000 (  0.1%)   0.0000 (  0.0%)  Lazy Machine Block Frequency Analysis
   0.0000 (  0.2%)   0.0000 (  0.1%)   0.0000 (  0.1%)   0.0000 (  0.0%)  StackMap Liveness Analysis
   0.0000 (  0.3%)   0.0000 (  0.0%)   0.0000 (  0.1%)   0.0000 (  0.0%)  Insert XRay ops
   0.0000 (  0.1%)   0.0000 (  0.0%)   0.0000 (  0.1%)   0.0000 (  0.0%)  Pre-ISel Intrinsic Lowering
   0.0000 (  0.1%)   0.0000 (  0.1%)   0.0000 (  0.1%)   0.0000 (  0.0%)  Rewrite Symbols
   0.0000 (  0.1%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Assumption Cache Tracker
   0.0000 (  0.1%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Library Information
   0.0000 (  0.1%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Branch Probability Analysis
   0.0000 (  0.1%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Pass Configuration
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Transform Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Library Information
   0.0000 (  0.1%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Profile summary info
   0.0000 (  0.1%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Force set function attributes
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Module Information
   0.0000 (  0.1%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Assumption Cache Tracker
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Create Garbage Collector Module Metadata
   0.0019 (100.0%)   0.0023 (100.0%)   0.0042 (100.0%)   0.0143 (100.0%)  Total

===-------------------------------------------------------------------------===
                          Clang front-end time report
===-------------------------------------------------------------------------===
  Total Execution Time: 5.6984 seconds (5.7773 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   5.5208 (100.0%)   0.1776 (100.0%)   5.6984 (100.0%)   5.7773 (100.0%)  Clang front-end timer
   5.5208 (100.0%)   0.1776 (100.0%)   5.6984 (100.0%)   5.7773 (100.0%)  Total

FAST TIME REPORT (-ftime-report):
=================================

===-------------------------------------------------------------------------===
                         Miscellaneous Ungrouped Timers
===-------------------------------------------------------------------------===

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0088 ( 76.3%)   0.0016 ( 65.0%)   0.0105 ( 74.3%)   0.0108 ( 74.8%)  LLVM IR Generation Time
   0.0027 ( 23.7%)   0.0009 ( 35.0%)   0.0036 ( 25.7%)   0.0036 ( 25.2%)  Code Generation Time
   0.0116 (100.0%)   0.0025 (100.0%)   0.0141 (100.0%)   0.0144 (100.0%)  Total

===-------------------------------------------------------------------------===
                                 DWARF Emission
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0000 seconds (0.0000 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0000 ( 57.1%)   0.0000 ( 57.1%)   0.0000 ( 57.1%)   0.0000 ( 54.3%)  Debug Info Emission
   0.0000 ( 32.1%)   0.0000 ( 28.6%)   0.0000 ( 31.0%)   0.0000 ( 33.7%)  DWARF Exception Writer
   0.0000 ( 10.7%)   0.0000 ( 14.3%)   0.0000 ( 11.9%)   0.0000 ( 12.0%)  DWARF Debug Writer
   0.0000 (100.0%)   0.0000 (100.0%)   0.0000 (100.0%)   0.0000 (100.0%)  Total

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0022 seconds (0.0022 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0006 ( 35.2%)   0.0002 ( 27.6%)   0.0007 ( 33.3%)   0.0007 ( 33.5%)  Expand Atomic instructions
   0.0005 ( 27.1%)   0.0001 ( 14.8%)   0.0005 ( 23.9%)   0.0005 ( 24.0%)  X86 Retpoline Thunks
   0.0002 (  9.3%)   0.0001 ( 19.7%)   0.0003 ( 11.9%)   0.0003 ( 12.0%)  X86 DAG->DAG Instruction Selection
   0.0001 (  5.8%)   0.0000 (  7.6%)   0.0001 (  6.3%)   0.0001 (  6.4%)  X86 Assembly Printer
   0.0000 (  2.9%)   0.0000 (  3.2%)   0.0001 (  3.0%)   0.0001 (  3.1%)  Prologue/Epilogue Insertion & Frame Finalization
   0.0000 (  1.6%)   0.0000 (  1.1%)   0.0000 (  1.4%)   0.0000 (  1.4%)  Fast Register Allocator
   0.0000 (  1.3%)   0.0000 (  1.6%)   0.0000 (  1.3%)   0.0000 (  1.3%)  Insert stack protectors
   0.0000 (  1.1%)   0.0000 (  2.1%)   0.0000 (  1.4%)   0.0000 (  1.2%)  Two-Address instruction pass
   0.0000 (  1.0%)   0.0000 (  1.4%)   0.0000 (  1.1%)   0.0000 (  1.2%)  MachineDominator Tree Construction
   0.0000 (  0.8%)   0.0000 (  1.8%)   0.0000 (  1.1%)   0.0000 (  1.1%)  Inliner for always_inline functions
   0.0000 (  0.8%)   0.0000 (  1.4%)   0.0000 (  1.0%)   0.0000 (  0.9%)  Dominator Tree Construction
   0.0000 (  0.6%)   0.0000 (  0.9%)   0.0000 (  0.7%)   0.0000 (  0.8%)  Machine Natural Loop Construction
   0.0000 (  0.7%)   0.0000 (  1.1%)   0.0000 (  0.8%)   0.0000 (  0.8%)  Post-RA pseudo instruction expansion pass
   0.0000 (  0.7%)   0.0000 (  0.9%)   0.0000 (  0.7%)   0.0000 (  0.7%)  Free MachineFunction
   0.0000 (  0.5%)   0.0000 (  0.7%)   0.0000 (  0.5%)   0.0000 (  0.5%)  CallGraph Construction
   0.0000 (  0.5%)   0.0000 (  0.5%)   0.0000 (  0.5%)   0.0000 (  0.5%)  MachineDominator Tree Construction
   0.0000 (  0.4%)   0.0000 (  0.7%)   0.0000 (  0.5%)   0.0000 (  0.4%)  Expand reduction intrinsics
   0.0000 (  0.4%)   0.0000 (  0.7%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Eliminate PHI nodes for register allocation
   0.0000 (  0.3%)   0.0000 (  0.9%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Local Stack Slot Allocation
   0.0000 (  0.4%)   0.0000 (  0.2%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Exception handling preparation
   0.0000 (  0.3%)   0.0000 (  0.7%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Scalarize Masked Memory Intrinsics
   0.0000 (  0.4%)   0.0000 (  0.4%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Basic Alias Analysis (stateless AA impl)
   0.0000 (  0.4%)   0.0000 (  0.2%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Machine Natural Loop Construction
   0.0000 (  0.4%)   0.0000 (  0.4%)   0.0000 (  0.4%)   0.0000 (  0.4%)  Dominator Tree Construction
   0.0000 (  0.2%)   0.0000 (  0.4%)   0.0000 (  0.3%)   0.0000 (  0.3%)  Insert XRay ops
   0.0000 (  0.2%)   0.0000 (  0.4%)   0.0000 (  0.2%)   0.0000 (  0.3%)  Expand indirectbr instructions
   0.0000 (  0.3%)   0.0000 (  0.4%)   0.0000 (  0.3%)   0.0000 (  0.3%)  Instrument function entry/exit with calls to e.g. mcount() (pre inlining)
   0.0000 (  0.2%)   0.0000 (  0.4%)   0.0000 (  0.3%)   0.0000 (  0.3%)  X86 WinAlloca Expander
   0.0000 (  0.2%)   0.0000 (  0.4%)   0.0000 (  0.3%)   0.0000 (  0.3%)  StackMap Liveness Analysis
   0.0000 (  0.3%)   0.0000 (  0.0%)   0.0000 (  0.2%)   0.0000 (  0.3%)  Instrument function entry/exit with calls to e.g. mcount() (post inlining)
   0.0000 (  0.2%)   0.0000 (  0.4%)   0.0000 (  0.2%)   0.0000 (  0.3%)  Insert fentry calls
   0.0000 (  0.2%)   0.0000 (  0.2%)   0.0000 (  0.2%)   0.0000 (  0.3%)  Bundle Machine CFG Edges
   0.0000 (  0.3%)   0.0000 (  0.5%)   0.0000 (  0.4%)   0.0000 (  0.3%)  Shadow Stack GC Lowering
   0.0000 (  0.2%)   0.0000 (  0.4%)   0.0000 (  0.2%)   0.0000 (  0.3%)  Machine Optimization Remark Emitter
   0.0000 (  0.2%)   0.0000 (  0.5%)   0.0000 (  0.3%)   0.0000 (  0.3%)  Analyze Machine Code For Garbage Collection
   0.0000 (  0.4%)   0.0000 (  0.5%)   0.0000 (  0.4%)   0.0000 (  0.3%)  Contiguously Lay Out Funclets
   0.0000 (  0.2%)   0.0000 (  0.4%)   0.0000 (  0.2%)   0.0000 (  0.3%)  Safe Stack instrumentation pass
   0.0000 (  0.2%)   0.0000 (  0.4%)   0.0000 (  0.3%)   0.0000 (  0.3%)  X86 Indirect Branch Tracking
   0.0000 (  0.2%)   0.0000 (  0.2%)   0.0000 (  0.2%)   0.0000 (  0.3%)  X86 PIC Global Base Reg Initialization
   0.0000 (  0.3%)   0.0000 (  0.4%)   0.0000 (  0.3%)   0.0000 (  0.2%)  Machine Optimization Remark Emitter
   0.0000 (  0.2%)   0.0000 (  0.5%)   0.0000 (  0.3%)   0.0000 (  0.2%)  Implement the 'patchable-function' attribute
   0.0000 (  0.2%)   0.0000 (  0.7%)   0.0000 (  0.3%)   0.0000 (  0.2%)  Lazy Machine Block Frequency Analysis
   0.0000 (  0.2%)   0.0000 (  0.2%)   0.0000 (  0.2%)   0.0000 (  0.2%)  X86 FP Stackifier
   0.0000 (  0.2%)   0.0000 (  0.2%)   0.0000 (  0.2%)   0.0000 (  0.2%)  Live DEBUG_VALUE analysis
   0.0000 (  0.2%)   0.0000 (  0.2%)   0.0000 (  0.2%)   0.0000 (  0.2%)  X86 vzeroupper inserter
   0.0000 (  0.2%)   0.0000 (  0.4%)   0.0000 (  0.3%)   0.0000 (  0.2%)  Expand ISel Pseudo-instructions
   0.0000 (  0.4%)   0.0000 (  0.2%)   0.0000 (  0.3%)   0.0000 (  0.2%)  Lazy Machine Block Frequency Analysis
   0.0000 (  0.4%)   0.0000 (  0.5%)   0.0000 (  0.4%)   0.0000 (  0.2%)  Remove unreachable blocks from the CFG
   0.0000 (  0.1%)   0.0000 (  0.2%)   0.0000 (  0.1%)   0.0000 (  0.2%)  Rewrite Symbols
   0.0000 (  0.2%)   0.0000 (  0.0%)   0.0000 (  0.1%)   0.0000 (  0.2%)  Lower Garbage Collection Instructions
   0.0000 (  0.3%)   0.0000 (  0.4%)   0.0000 (  0.3%)   0.0000 (  0.2%)  X86 pseudo instruction expansion pass
   0.0000 (  0.1%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.1%)  Assumption Cache Tracker
   0.0000 (  0.1%)   0.0000 (  0.2%)   0.0000 (  0.1%)   0.0000 (  0.1%)  Pre-ISel Intrinsic Lowering
   0.0000 (  0.1%)   0.0000 (  0.2%)   0.0000 (  0.1%)   0.0000 (  0.0%)  Target Pass Configuration
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Library Information
   0.0000 (  0.1%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Transform Information
   0.0000 (  0.1%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Assumption Cache Tracker
   0.0000 (  0.1%)   0.0000 (  0.2%)   0.0000 (  0.1%)   0.0000 (  0.0%)  Profile summary info
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Library Information
   0.0000 (  0.1%)   0.0000 (  0.2%)   0.0000 (  0.1%)   0.0000 (  0.0%)  Force set function attributes
   0.0000 (  0.0%)   0.0000 (  0.2%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Branch Probability Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Module Information
   0.0000 (  0.1%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Create Garbage Collector Module Metadata
   0.0017 (100.0%)   0.0006 (100.0%)   0.0022 (100.0%)   0.0022 (100.0%)  Total

===-------------------------------------------------------------------------===
                          Clang front-end time report
===-------------------------------------------------------------------------===
  Total Execution Time: 0.5674 seconds (0.5681 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.5402 (100.0%)   0.0272 (100.0%)   0.5674 (100.0%)   0.5681 (100.0%)  Clang front-end timer
   0.5402 (100.0%)   0.0272 (100.0%)   0.5674 (100.0%)   0.5681 (100.0%)  Total
llvmbot commented 6 years ago

I don't. I could get one from the attached example, but I'm away from my computer right now.

ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 6 years ago

Some slowdown from adding an extra template parameter would be expected, but 10x or even 60% seems unreasonable. Do you have any profiling data (or even just a few backtraces from the slow run) to help identify what's going slowly?