Open Diomendius opened 4 years ago
Let's try to find the root cause of this. @rustbot ping llvm
Hey LLVM ICE-breakers! This bug has been identified as a good "LLVM ICE-breaking candidate". In case it's useful, here are some instructions for tackling these sorts of bugs. Maybe take a look? Thanks! <3
cc @comex @CrazyDebugger @cuviper @DutchGhost @hanna-kruppe @hdhoang @heyrutvik @JOE1994 @jryans @mmilenko @nagisa @nikic @Noah-Kennedy @SiavoshZarrasvand @spastorino @vertexclique
I skimmed the output of -Cllvm-args=-print-after-all
and it appears that the redundancy only becomes apparent late in the CodeGen pipeline (after register allocation and subsequent copy propagation).
The underlying optimization opportunity here is realizing that, after the b = f();
block is entered once, a
is known to be zero and won't change any more, so this block could jump straight to the b == 0
check. But making this apparent requires duplicating the loop header (which first branches on whether a == 0
) into each of its predecessors (the entry block, the a = f();
block, and the b = f();
). But such duplication is pretty speculative and generally costs some code size, so in this case it does not happen until very late in the CodeGen pipeline, post-RA, specifically during MachineBlockPlacement. The redundancy of the a == 0?
check only becomes apparent after this tail duplication, and post-MBP passes aren't equipped to clean it up. Duplicating earlier in the pipeline would probably help, but I don't know how realistic it is to do that kind of transformation early on and be confident that it's helpful.
We could try filing an LLVM bug report but tbh I'm pessimistic about how easy it is to fix this and how important such a fix would be, realistically speaking. (Is this example distilled from a more realistic program? Does the redundant test cause a measurable performance problem there?)
Just wondering, shouldn't this instuction pattern be caught by some sort of peephole optimization?
The example I gave is derived from some code I was initially using to test how well Rust optimizes exactly the situation you described, where a branch in a loop sets a variable another branch is conditional upon, and that variable remains fixed until the loop exits.
This in turn is derived from some more practical code where a
and b
are Option
s returned by iterators. The algorithm in question is similar to BTreeSet::symmetric_difference()
, but operates on sorted sequences and makes a distinction between left-hand-only and right-hand-only elements.
I doubt this would have a significant performance impact in my case, but it is strange to see something that appears so trivially wrong from the outside show up in generated assembly.
Just wondering, shouldn't this instuction pattern be caught by some sort of peephole optimization?
If any peephole optimizations capable of doing this would see this instruction sequence, sure. But as I said, it only becomes apparent very late in the pipeline, after the last time any such pass was executed. Before MachineBlockPlacement, there's nothing to peephole optimize.
Assigning P-medium
as [discussed as part of the Prioritization Working Group process](https://rust-lang.zulipchat.com/#narrow/stream/227806-t-compiler.2Fwg-prioritization/topic/I-prioritize.20.2372343.20Conditional.20jumps.20equivalent.20to.20if(0.20!.3D) and removing I-prioritize
.
Some manual bisecting on godbolt shows this appearing sometime between 1.24 and 1.25. The assembly generated by 1.25 and current nightly is almost identical, just a slightly different prologue and register choices. The assembly on 1.24 looks significantly worse in other ways though.
I've got a reasonably minimal test case whose assembly (with
opt-level=1
and above) includes this sequence of opcodes, which is effectively a no-op as far as I can tell:This is the code in question (playground):
I'm not sure whether this is more LLVM's fault or Rust's. Either way, this should not end up in the finished assembly, and I am fairly certain that at least in this specific case all three instructions can be removed entirely without affecting program behaviour.
rustc 1.45.0-nightly (2020-05-18 d8878868c8d7ef3779e7)
is the most recent version I have tested that shows this issue, andrustc 1.25.0
on godbolt.org is the oldest.rustc 1.24.0
does not show the issue, though there's a good chance this is just coincidence.