Closed elliottslaughter closed 4 years ago
A little bit of very unscientific debugging. Here are some backtraces from when I run. The main culprits seem to be SelectionDAGISel
and StackSlotColoring
in LLVM.
The LLVM IR generated by terra has a store op i.e. "store [50000 x i32] %4, [50000 x i32]* %0, align 4" -> store 50,000 i32 values. (and a similar load op) that is causing a large # of scalar store/move instructions to be generated and hence large compile times. A dump of the LLVM IR after each codegen pass (llc unpack_param.ll -print-after-all) points to the Expand ISel Pseudo-instructions pass that ends up generating these instructions. On the terra side, it could generate memcpy instructions to avoid the compilation issue - changing the LLVM IR manually resulted in a significant drop in compilation time.
This is fixed in #430 for LLVM versions >= 6, and partially fixed for < 6. The behavior with < 6 is a little odd, compile times are instantaneous when using saveobj
, but still slow (though less bad) with :compile()
. I'm inclined to accept this as success and move on. But it does mean that for the best compile performance, >= 6 is recommended.
In the Regent project we recently found a case where compile times were blowing up with the use of a large array (https://github.com/StanfordLegion/legion/issues/724). Minimized and stripped of all Regent-specific logic, the reproducer is at the bottom of this post.
The fun part is, as best I can tell every part of this reproducer is necessary. The code listed below takes about 2.5 minutes to compile. But if you remove the dereference of
buffer
, it compiles instantly. If you remove the return it compiles instantly. If you don'tassert_()
it compiles instantly. If youassert_:setinlined(false)
it compiles instantly.The key seems to be doing a computation on the pointer passed in, and branching on it, in the same function as the pointer itself gets dereferenced (and returned). Presumably we're tripping over SROA in LLVM, or something like that.