Open belkadan opened 8 years ago
cc @swiftix
Is this with optimization?
EDIT: Nm I read the full thing.
Added more configuration information. Now trying a Release+Asserts build of all components.
Okay, a Release+Asserts build looks more like the Release build than the Debug build. What's going on here?
===-------------------------------------------------------------------------===
Swift compilation
===-------------------------------------------------------------------------===
Total Execution Time: 224.8902 seconds (225.0464 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
92.1323 ( 41.4%) 0.6140 ( 24.7%) 92.7464 ( 41.2%) 92.7973 ( 41.2%) LLVM optimization
60.6715 ( 27.3%) 1.2473 ( 50.1%) 61.9189 ( 27.5%) 61.9402 ( 27.5%) SIL optimization
45.0392 ( 20.3%) 0.2393 ( 9.6%) 45.2784 ( 20.1%) 45.3518 ( 20.2%) LLVM output
12.3687 ( 5.6%) 0.2708 ( 10.9%) 12.6394 ( 5.6%) 12.6487 ( 5.6%) IRGen
7.7395 ( 3.5%) 0.0344 ( 1.4%) 7.7739 ( 3.5%) 7.7746 ( 3.5%) Type checking / Semantic analysis
1.8397 ( 0.8%) 0.0533 ( 2.1%) 1.8930 ( 0.8%) 1.8932 ( 0.8%) SILGen
1.0223 ( 0.5%) 0.0086 ( 0.3%) 1.0309 ( 0.5%) 1.0311 ( 0.5%) SIL verification (post-optimization)
0.8511 ( 0.4%) 0.0006 ( 0.0%) 0.8517 ( 0.4%) 0.8518 ( 0.4%) AST verification
0.5987 ( 0.3%) 0.0097 ( 0.4%) 0.6084 ( 0.3%) 0.6084 ( 0.3%) SIL verification (pre-optimization)
0.0786 ( 0.0%) 0.0077 ( 0.3%) 0.0863 ( 0.0%) 0.0863 ( 0.0%) Serialization (swiftmodule)
0.0424 ( 0.0%) 0.0020 ( 0.1%) 0.0444 ( 0.0%) 0.0444 ( 0.0%) Parsing
0.0184 ( 0.0%) 0.0002 ( 0.0%) 0.0186 ( 0.0%) 0.0186 ( 0.0%) Serialization (swiftdoc)
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Name binding
222.4024 (100.0%) 2.4878 (100.0%) 224.8902 (100.0%) 225.0464 (100.0%) Total
Comment by Xin Tong (JIRA)
Can we get a breakdown of the optimization passes responsible in DEBUG build? Roman told me \~2 weeks ago that DSE and RLE are taking quite a bit of time in StdlibUnitTest and I have not got the time to optimize the 2 passes for the it. (I did optimize the 2 passes for the stdlib quite a bit).
I will be getting to the task soon though.
I accidentally hung Instruments, but I'll run the experiment again tomorrow. I think DSE was pretty high up there, along with ARC opts.
Comment by Xin Tong (JIRA)
I have committed something (4491d86dc3611f76e031fd44a77529cb39bb0140 ) that will reduce the compilation time of DSE on the stdlibunittest quite a bit. I have observed DSE drops from 1.8% to 1.4% of the overall compilation time on compiling stdlib -O.
Thanks, Xin! I'll make sure to grab your changes before retesting.
Xin's change helped a lot. Here are the passes that take the most time in a Debug build now:
Running Time Self (ms) Symbol Name
85674.0ms 11.4% 9.0 (anonymous namespace)::ARCSequenceOpts::run()
61178.0ms 8.2% 4.0 (anonymous namespace)::SILCombine::run()
52346.0ms 7.0% 5.0 (anonymous namespace)::RedundantLoadElimination::run()
50140.0ms 6.7% 1.0 (anonymous namespace)::DeadStoreElimination::run()
36925.0ms 4.9% 11.0 (anonymous namespace)::SILCSE::run()
32228.0ms 4.3% 6.0 (anonymous namespace)::SimplifyCFGPass::run()
Nearly all of the ARC time is spent in LoopARCPairingContext::runOnLoop. In SILCombine, BuiltinInst::getBuiltinInfo() takes a surprising amount of time (because of the DenseMap), and cacheDebugLoc() isn't cheap either. Alias analysis slows down RLE and DSE with unoptimized DenseMaps. SILCSE uses ScopedHashTableScopes, which are expensive. And SimplifyCFG spends time calculating dominance.
Basically, DenseMaps are silly-expensive in Debug modes, likely because none of the DenseMapInfo is getting inlined. I wish we could say "okay, optimize these functions" even in a "Debug" build.
Comment by Xin Tong (JIRA)
what is the total amount of time spent in optimizer now ?
It's still a good 7 minutes in debug builds. My LLVM is Release+Asserts, though, so it's not exactly a fair comparison.
Jordan, part of the problem in ARC is that we are visiting the CFG too many times. I am going to be fixing this in the spring.
With respect to the DenseMap issue, just like we use Release+Asserts LLVM, perhaps it would make sense to try having a special library that is compiled in release+asserts mode that has explicit template instantiations of the various DenseMaps that we use. Then everything else will use external templates of the DenseMaps.
I imagine many of these cases involve data types that are trivially copyable (i.e. pointers, ints, etc).
Comment by Xin Tong (JIRA)
And for what it is worth. This is where we currently stand for compiling stdlibunit test -O with a release with assertion for llvm and swift compiler.
===-------------------------------------------------------------------------===
Swift compilation
===-------------------------------------------------------------------------===
Running Time Self (ms) Symbol Name
23346.0ms 21.6% 53.0 swift::SILPassManager::runPassesOnFunction(llvm::ArrayRef<swift::SILFunctionTransform*>, swift::SILFunction*)
6801.0ms 6.3% 8.0 (anonymous namespace)::ARCSequenceOpts::run()
3293.0ms 3.0% 81.0 (anonymous namespace)::DeadStoreElimination::run()
3037.0ms 2.8% 8.0 (anonymous namespace)::SILCombine::run()
2521.0ms 2.3% 2.0 (anonymous namespace)::SimplifyCFGPass::run()
2250.0ms 2.0% 449.0 (anonymous namespace)::RedundantLoadElimination::run()
1972.0ms 1.8% 110.0 (anonymous namespace)::SILCSE::run()
755.0ms 0.7% 177.0 (anonymous namespace)::DCE::run()
474.0ms 0.4% 6.0 (anonymous namespace)::SILCodeMotion::run()
414.0ms 0.3% 0.0 (anonymous namespace)::StackPromotion::run()
255.0ms 0.2% 6.0 (anonymous namespace)::ABCOpt::run()
233.0ms 0.2% 1.0 (anonymous namespace)::ConstantPropagation::run()
199.0ms 0.1% 0.0 (anonymous namespace)::SILMem2Reg::run()
183.0ms 0.1% 63.0 (anonymous namespace)::SILLowerAggregate::run()
155.0ms 0.1% 38.0 (anonymous namespace)::SILSROA::run()
114.0ms 0.1% 14.0 (anonymous namespace)::LICM::run()
Comment by Xin Tong (JIRA)
Mark, I know our compilation time is much better now. Can you please post the numbers and close this SR.
Compilation time is generally better now, but we're still much slower in the debug build than release build.
I don't think this is due to any expensive checks, but rather that the unoptimized compiler is just much much slower.
I spent some time this morning building compiler variants and timing them while building StdlibUnitTest. Here's what I see now for the SIL optimizer time.
In each case LLVM/clang is release, no-asserts. For swift, we have
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
5.6017 ( 32.5%) 0.1291 ( 70.5%) 5.7308 ( 32.9%) 5.7346 ( 32.9%) SIL optimization - RELEASE
6.4858 ( 34.5%) 0.1490 ( 62.6%) 6.6349 ( 34.8%) 6.6467 ( 34.5%) SIL optimization - RELEASE+ASSERT
55.7352 ( 66.8%) 0.1773 ( 66.7%) 55.9124 ( 66.8%) 55.9345 ( 66.8%) SIL optimization - DEBUG+ASSERT
So the DEBUG+ASSERT time is around 8.5x what the RELEASE+ASSERT time is.
Note that I believe StdlibUnitTest was broken up into pieces since this was opened, so this is very different than the situation we were in four months ago.
In any event, I'm not sure that there's really anything actionable here. We could consider various build changes to ease the pain of some developers (e.g. compile the SIL optimizer with optimizations enabled in some configuration so that the people who don't need to debug it aren't spending time waiting for it to execute during builds).
Since I'm not one of the people who would use that mode, I'm not especially inclined to add more complexity to our build system, though.
Bob - not sure if this is worth keeping open or not, but it probably shouldn't be on me.
Additional Detail from JIRA
| | | |------------------|-----------------| |Votes | 0 | |Component/s | Compiler | |Labels | Bug, Performance | |Assignee | @bob-wilson | |Priority | Medium | md5: 7fe4167166d7837ebe7280951902658fIssue Description:
Within the last two months the compile time for StdlibUnittest under a debug compiler has jumped quite a bit. With the new frontend flag
-debug-time-compilation
, I found that SIL optimization seems to be the culprit.Now, StdlibUnittest is a bit large and unusual in several metrics, but still.
Admittedly, that was a debug compiler, with LLVM compiled as RelWithDebInfo and swiftc compiled as Debug. Under a release build (without assertions), the numbers are a lot better...
...so I wonder if we have some really expensive assertions turned on.