[SR-292] Debug compiler: SIL Optimization takes 65% of compilation time for StdlibUnittest

belkadan commented 8 years ago


Previous ID	SR-292
Radar	None
Original Reporter	@belkadan
Type	Bug

Additional Detail from JIRA

| | | |------------------|-----------------| |Votes | 0 | |Component/s | Compiler | |Labels | Bug, Performance | |Assignee | @bob-wilson | |Priority | Medium | md5: 7fe4167166d7837ebe7280951902658f

Issue Description:

Within the last two months the compile time for StdlibUnittest under a debug compiler has jumped quite a bit. With the new frontend flag -debug-time-compilation, I found that SIL optimization seems to be the culprit.

===-------------------------------------------------------------------------===
                               Swift compilation
===-------------------------------------------------------------------------===
  Total Execution Time: 849.5641 seconds (851.0508 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  556.7332 ( 65.8%)   1.7051 ( 52.8%)  558.4382 ( 65.7%)  558.7890 ( 65.7%)  SIL optimization
  98.1063 ( 11.6%)   0.6438 ( 19.9%)  98.7501 ( 11.6%)  99.1275 ( 11.6%)  LLVM optimization
  70.1294 (  8.3%)   0.1772 (  5.5%)  70.3066 (  8.3%)  70.7880 (  8.3%)  Type checking / Semantic analysis
  45.8265 (  5.4%)   0.2249 (  7.0%)  46.0514 (  5.4%)  46.2519 (  5.4%)  LLVM output
  37.0902 (  4.4%)   0.3492 ( 10.8%)  37.4394 (  4.4%)  37.4759 (  4.4%)  IRGen
  15.6720 (  1.9%)   0.0720 (  2.2%)  15.7440 (  1.9%)  15.7567 (  1.9%)  SILGen
   9.3491 (  1.1%)   0.0151 (  0.5%)   9.3641 (  1.1%)   9.3649 (  1.1%)  SIL verification (post-optimization)
   7.0676 (  0.8%)   0.0141 (  0.4%)   7.0816 (  0.8%)   7.0843 (  0.8%)  AST verification
   5.3563 (  0.6%)   0.0148 (  0.5%)   5.3711 (  0.6%)   5.3715 (  0.6%)  SIL verification (pre-optimization)
   0.5715 (  0.1%)   0.0072 (  0.2%)   0.5786 (  0.1%)   0.5885 (  0.1%)  Serialization (swiftmodule)
   0.3793 (  0.0%)   0.0062 (  0.2%)   0.3854 (  0.0%)   0.3992 (  0.0%)  Parsing
   0.0530 (  0.0%)   0.0003 (  0.0%)   0.0533 (  0.0%)   0.0533 (  0.0%)  Serialization (swiftdoc)
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Name binding
  846.3343 (100.0%)   3.2298 (100.0%)  849.5641 (100.0%)  851.0508 (100.0%)  Total

Now, StdlibUnittest is a bit large and unusual in several metrics, but still.

Admittedly, that was a debug compiler, with LLVM compiled as RelWithDebInfo and swiftc compiled as Debug. Under a release build (without assertions), the numbers are a lot better...

===-------------------------------------------------------------------------===
                               Swift compilation
===-------------------------------------------------------------------------===
  Total Execution Time: 401.3502 seconds (401.6033 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  162.3543 ( 41.0%)   1.6402 ( 31.8%)  163.9946 ( 40.9%)  164.0571 ( 40.9%)  LLVM optimization
  119.7621 ( 30.2%)   2.5938 ( 50.3%)  122.3559 ( 30.5%)  122.3938 ( 30.5%)  SIL optimization
  83.9661 ( 21.2%)   0.5395 ( 10.5%)  84.5056 ( 21.1%)  84.6360 ( 21.1%)  LLVM output
  15.4938 (  3.9%)   0.2337 (  4.5%)  15.7275 (  3.9%)  15.7461 (  3.9%)  IRGen
   7.7852 (  2.0%)   0.0487 (  0.9%)   7.8339 (  2.0%)   7.8356 (  2.0%)  Type checking / Semantic analysis
   3.2124 (  0.8%)   0.0204 (  0.4%)   3.2328 (  0.8%)   3.2342 (  0.8%)  SIL verification (post-optimization)
   1.8400 (  0.5%)   0.0497 (  1.0%)   1.8897 (  0.5%)   1.8899 (  0.5%)  SILGen
   0.8468 (  0.2%)   0.0016 (  0.0%)   0.8484 (  0.2%)   0.8486 (  0.2%)  AST verification
   0.6459 (  0.2%)   0.0106 (  0.2%)   0.6564 (  0.2%)   0.6567 (  0.2%)  SIL verification (pre-optimization)
   0.2309 (  0.1%)   0.0133 (  0.3%)   0.2442 (  0.1%)   0.2443 (  0.1%)  Serialization (swiftmodule)
   0.0411 (  0.0%)   0.0013 (  0.0%)   0.0424 (  0.0%)   0.0424 (  0.0%)  Parsing
   0.0185 (  0.0%)   0.0003 (  0.0%)   0.0188 (  0.0%)   0.0188 (  0.0%)  Serialization (swiftdoc)
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Name binding
  396.1972 (100.0%)   5.1531 (100.0%)  401.3502 (100.0%)  401.6033 (100.0%)  Total

...so I wonder if we have some really expensive assertions turned on.

belkadan commented 8 years ago

cc @swiftix

gottesmm commented 8 years ago

Is this with optimization?

EDIT: Nm I read the full thing.

belkadan commented 8 years ago

Added more configuration information. Now trying a Release+Asserts build of all components.

belkadan commented 8 years ago

Okay, a Release+Asserts build looks more like the Release build than the Debug build. What's going on here?

===-------------------------------------------------------------------------===
                               Swift compilation
===-------------------------------------------------------------------------===
  Total Execution Time: 224.8902 seconds (225.0464 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  92.1323 ( 41.4%)   0.6140 ( 24.7%)  92.7464 ( 41.2%)  92.7973 ( 41.2%)  LLVM optimization
  60.6715 ( 27.3%)   1.2473 ( 50.1%)  61.9189 ( 27.5%)  61.9402 ( 27.5%)  SIL optimization
  45.0392 ( 20.3%)   0.2393 (  9.6%)  45.2784 ( 20.1%)  45.3518 ( 20.2%)  LLVM output
  12.3687 (  5.6%)   0.2708 ( 10.9%)  12.6394 (  5.6%)  12.6487 (  5.6%)  IRGen
   7.7395 (  3.5%)   0.0344 (  1.4%)   7.7739 (  3.5%)   7.7746 (  3.5%)  Type checking / Semantic analysis
   1.8397 (  0.8%)   0.0533 (  2.1%)   1.8930 (  0.8%)   1.8932 (  0.8%)  SILGen
   1.0223 (  0.5%)   0.0086 (  0.3%)   1.0309 (  0.5%)   1.0311 (  0.5%)  SIL verification (post-optimization)
   0.8511 (  0.4%)   0.0006 (  0.0%)   0.8517 (  0.4%)   0.8518 (  0.4%)  AST verification
   0.5987 (  0.3%)   0.0097 (  0.4%)   0.6084 (  0.3%)   0.6084 (  0.3%)  SIL verification (pre-optimization)
   0.0786 (  0.0%)   0.0077 (  0.3%)   0.0863 (  0.0%)   0.0863 (  0.0%)  Serialization (swiftmodule)
   0.0424 (  0.0%)   0.0020 (  0.1%)   0.0444 (  0.0%)   0.0444 (  0.0%)  Parsing
   0.0184 (  0.0%)   0.0002 (  0.0%)   0.0186 (  0.0%)   0.0186 (  0.0%)  Serialization (swiftdoc)
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Name binding
  222.4024 (100.0%)   2.4878 (100.0%)  224.8902 (100.0%)  225.0464 (100.0%)  Total

swift-ci commented 8 years ago

Comment by Xin Tong (JIRA)

Can we get a breakdown of the optimization passes responsible in DEBUG build? Roman told me \~2 weeks ago that DSE and RLE are taking quite a bit of time in StdlibUnitTest and I have not got the time to optimize the 2 passes for the it. (I did optimize the 2 passes for the stdlib quite a bit).

I will be getting to the task soon though.

belkadan commented 8 years ago

I accidentally hung Instruments, but I'll run the experiment again tomorrow. I think DSE was pretty high up there, along with ARC opts.

swift-ci commented 8 years ago

Comment by Xin Tong (JIRA)

I have committed something (4491d86dc3611f76e031fd44a77529cb39bb0140 ) that will reduce the compilation time of DSE on the stdlibunittest quite a bit. I have observed DSE drops from 1.8% to 1.4% of the overall compilation time on compiling stdlib -O.

belkadan commented 8 years ago

Thanks, Xin! I'll make sure to grab your changes before retesting.

belkadan commented 8 years ago

Xin's change helped a lot. Here are the passes that take the most time in a Debug build now:

Running Time            Self (ms)          Symbol Name
85674.0ms   11.4%   9.0            (anonymous namespace)::ARCSequenceOpts::run()
61178.0ms    8.2%   4.0            (anonymous namespace)::SILCombine::run()
52346.0ms    7.0%   5.0            (anonymous namespace)::RedundantLoadElimination::run()
50140.0ms    6.7%   1.0            (anonymous namespace)::DeadStoreElimination::run()
36925.0ms    4.9%   11.0               (anonymous namespace)::SILCSE::run()
32228.0ms    4.3%   6.0            (anonymous namespace)::SimplifyCFGPass::run()

Nearly all of the ARC time is spent in LoopARCPairingContext::runOnLoop. In SILCombine, BuiltinInst::getBuiltinInfo() takes a surprising amount of time (because of the DenseMap), and cacheDebugLoc() isn't cheap either. Alias analysis slows down RLE and DSE with unoptimized DenseMaps. SILCSE uses ScopedHashTableScopes, which are expensive. And SimplifyCFG spends time calculating dominance.

Basically, DenseMaps are silly-expensive in Debug modes, likely because none of the DenseMapInfo is getting inlined. I wish we could say "okay, optimize these functions" even in a "Debug" build.

swift-ci commented 8 years ago

Comment by Xin Tong (JIRA)

what is the total amount of time spent in optimizer now ?

belkadan commented 8 years ago

It's still a good 7 minutes in debug builds. My LLVM is Release+Asserts, though, so it's not exactly a fair comparison.

gottesmm commented 8 years ago

Jordan, part of the problem in ARC is that we are visiting the CFG too many times. I am going to be fixing this in the spring.

With respect to the DenseMap issue, just like we use Release+Asserts LLVM, perhaps it would make sense to try having a special library that is compiled in release+asserts mode that has explicit template instantiations of the various DenseMaps that we use. Then everything else will use external templates of the DenseMaps.

I imagine many of these cases involve data types that are trivially copyable (i.e. pointers, ints, etc).

swift-ci commented 8 years ago

Comment by Xin Tong (JIRA)

And for what it is worth. This is where we currently stand for compiling stdlibunit test -O with a release with assertion for llvm and swift compiler.

===-------------------------------------------------------------------------===
                               Swift compilation
===-------------------------------------------------------------------------===
Running Time    Self (ms)               Symbol Name
23346.0ms   21.6%       53.0                    swift::SILPassManager::runPassesOnFunction(llvm::ArrayRef<swift::SILFunctionTransform*>, swift::SILFunction*)
6801.0ms    6.3%        8.0                      (anonymous namespace)::ARCSequenceOpts::run()
3293.0ms    3.0%        81.0                     (anonymous namespace)::DeadStoreElimination::run()
3037.0ms    2.8%        8.0                      (anonymous namespace)::SILCombine::run()
2521.0ms    2.3%        2.0                      (anonymous namespace)::SimplifyCFGPass::run()
2250.0ms    2.0%        449.0                    (anonymous namespace)::RedundantLoadElimination::run()
1972.0ms    1.8%        110.0                    (anonymous namespace)::SILCSE::run()
755.0ms    0.7% 177.0                             (anonymous namespace)::DCE::run()
474.0ms    0.4% 6.0                                 (anonymous namespace)::SILCodeMotion::run()
414.0ms    0.3% 0.0                                 (anonymous namespace)::StackPromotion::run()
255.0ms    0.2% 6.0                                 (anonymous namespace)::ABCOpt::run()
233.0ms    0.2% 1.0                                 (anonymous namespace)::ConstantPropagation::run()
199.0ms    0.1% 0.0                                 (anonymous namespace)::SILMem2Reg::run()
183.0ms    0.1% 63.0                               (anonymous namespace)::SILLowerAggregate::run()
155.0ms    0.1% 38.0                               (anonymous namespace)::SILSROA::run()
114.0ms    0.1% 14.0                               (anonymous namespace)::LICM::run()

swift-ci commented 8 years ago

Comment by Xin Tong (JIRA)

Mark, I know our compilation time is much better now. Can you please post the numbers and close this SR.

rudkx commented 8 years ago

Compilation time is generally better now, but we're still much slower in the debug build than release build.

I don't think this is due to any expensive checks, but rather that the unoptimized compiler is just much much slower.

I spent some time this morning building compiler variants and timing them while building StdlibUnitTest. Here's what I see now for the SIL optimizer time.

In each case LLVM/clang is release, no-asserts. For swift, we have

  ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   5.6017 ( 32.5%)   0.1291 ( 70.5%)   5.7308 ( 32.9%)   5.7346 ( 32.9%)  SIL optimization - RELEASE
   6.4858 ( 34.5%)   0.1490 ( 62.6%)   6.6349 ( 34.8%)   6.6467 ( 34.5%)  SIL optimization - RELEASE+ASSERT
  55.7352 ( 66.8%)   0.1773 ( 66.7%)  55.9124 ( 66.8%)  55.9345 ( 66.8%)  SIL optimization - DEBUG+ASSERT

So the DEBUG+ASSERT time is around 8.5x what the RELEASE+ASSERT time is.

Note that I believe StdlibUnitTest was broken up into pieces since this was opened, so this is very different than the situation we were in four months ago.

In any event, I'm not sure that there's really anything actionable here. We could consider various build changes to ease the pain of some developers (e.g. compile the SIL optimizer with optimizations enabled in some configuration so that the people who don't need to debug it aren't spending time waiting for it to execute during builds).

Since I'm not one of the people who would use that mode, I'm not especially inclined to add more complexity to our build system, though.

rudkx commented 8 years ago

Bob - not sure if this is worth keeping open or not, but it probably shouldn't be on me.

swiftlang / swift

[SR-292] Debug compiler: SIL Optimization takes 65% of compilation time for StdlibUnittest #42914