Exponential code explosion during inlining

JosephTremoulet commented 3 years ago


Bugzilla Link	50485
Version	trunk
OS	All
Attachments	Repro showing pattern for exponential explosion, grows 1000x at this size.
CC	@chandlerc,@fhahn,@JosephTremoulet

Extended Description

Some of our code started failing to compile when change ed9df5bd2f50b changed the optimization pipeline. The attached repro case demonstrates the issue (seen via opt -passes 'default<O3>' or at https://godbolt.org/z/3vccKbxb5).

The optimizations done to clean up inlined code can result in code being smaller when considered as a transitive inline from a subsequent SCC than it was when considered from its own SCC. This makes the inlines look more favorable after their SCC has already converged, defeating the intent that the bottom-up ordering will ensure that prior SCCs are already flattened out to the point that inlining deeply through them will be undesirable.

Why commit ed9df5bd2f50b triggered this in our code is that the function sizes before and after inlining cleanup now happen to yield inline costs that straddle the threshold.

The attached repro case is structured as follows:

It contains an SCC comprising a series of tiers. Each tier is densely connected (one edge short of a clique), and has one edge to the next tier, with the last having an edge back to the first, completing the cycle.
Each individual function in each tier has just the right size and redundant code (a handful of noop stores) so that calls to it are not inlined when the large SCC is visited, but are inlined when it is revisited by transitive inlining of the next SCC.
The only other code is a singleton SCC (function @outside_scc) that calls into the large SCC.

Inlining in @outside_scc proceeds until the InlineHistory kicks in and cuts it off, which permits one inline of each acyclic path through the SCC, and the number of such paths is exponential in the number of tiers, hence the code explosion.

The original repro cases we've seen this in are actually two of the specint2k benchmarks, though of course this is using our custom toolchain, so I'd say it occurs in code that's real to us (we'll need a fix downstream regardless of what happens upstream).

JosephTremoulet commented 3 years ago

Hongtao, I'm assigning this to you per your comment "I will work on a separate per-function capping mechanism" at https://lists.llvm.org/pipermail/llvm-dev/2021-June/151006.html, since we expect such a cap to resolve this issue.

Please reassign if I've misunderstood. Thanks.

JosephTremoulet commented 3 years ago

cc @chandlerc, not sure how closely this may be related to the issue in llvm/llvm-project#32419 .

cc @fhahn, since ed9df5bd2f50b ostensibly triggered this and I don't know if maybe there's a phase-ordering fix that would make sense.

JosephTremoulet commented 3 years ago

Some ideas for potential approaches to fixing the issue:

1 - During inlining, rather than adding every inlinee's callsites to the worklist of inline candidates, only do so when the inlinee is in the current SCC. This would seem to be in line (no pun intended) with the intent of the inliner, that calls left in already-converged SCCs are unprofitable inlines. But I'm not sure I'm interpreting that intent correctly.

2 - Arrange the optimization passes so that cleanup can't happen between one SCC and the next, so that the inline costs won't change between the time when a call is processed in its own SCC vs the next. I'm not sure how feasible that is.

3 - Adjust the InlineHistory logic to also have a cap on the length of CG paths it inlines. That would make the upper bound on code growth sensitive to the fan-out of the call graph, rather than its depth, which would help because fan-out, unlike depth, can't be arbitrarily increased without making the individual functions too big to inline.

llvm / llvm-project

Exponential code explosion during inlining #49829

Extended Description