Closed regehr closed 2 years ago
mentioned in issue llvm/llvm-bugzilla-archive#5459
My recollection is that I posted something to the list but there was no agreement on the right answers or what the rules should be. I guess the current state is OK. The testcase now works with both llvm-gcc and clang at all optimization levels, and produces much better code than shown here. Might as well close it.
What's happening with this?
Ah OK, I didn't understand that. I'll write something up.
The rules are not defined, I was suggesting that you propose a set of rules :)
If you mean write something that describes the rules, I'll need answers to the questions above before I can do that. Don't put them in bugzilla if you don't want to:)
That sounds great! Can you write something up and send it to llvmdev? I don't think this bugzilla is the right place to discuss it :)
Certainly there's more than one way to do this; I think a representation of one edge per target is better mostly because it uses less memory, but as Dan points out it is not one-sided, and I'm not trying to insist on it. What bothers me more is that I'm not sure the rules for the CFG are consistent and fully formed even in your head, let alone in docs. The BranchFolding code has always removed redundant edges, you have reviewed that code, and you haven't had a problem with it until now. Also, the CFG you say you want for the block in question,
jne LBB1_1
jle LBB1_1
LBB1_1:
doesn't conform to your simple rule; it should have 3 edges not 2 according to your statement of the rule. Let me go through some examples to try to get the rules straight.
How many edges for the example above?
jne LBB1_2
jle LBB1_3
LBB1_1:
This one must have 3 edges. If the answer to the previous block is 2, why should this be different? If this is invalid, why?
jne LBB1_1
jle LBB1_2
LBB1_1:
Presumably 3 edges?
jge LBB1_1
jne LBB1_1
jle LBB1_1
LBB1_1:
Is this valid? If so how many edges, if not why not?
jge LBB1_1
jne LBB1_2
jle LBB1_1
LBB1_2:
Is this valid?
switch LBB1_1, [LBB1_1, LBB1_1]
LBB1_1:
This is another way to express the block that's causing all the trouble; transformations might reasonably turn one into the other. How many edges?
My intent here, at the moment, is to figure out what the rules are and then document them.
I completely agree that we need better documentation about what the machinecfg invariants are though. Also maybe the machine code verifier could catch some of this stuff?
I disagree in a couple of ways. Yes, it is bad that the code gen is not cleaning this up and eliminating the branches all together. However, the CFG has to be well defined even if codegen fails to do this. If we end up with something silly like:
add jne LBB1 jmp LBB42 LBB1:
The cfg should still be well formed. I think that the CFG should be formed with simple rules: if a block falls through, there is one edge to represent the fall through. In any case, regardless of fallthrough, if there are other branch instructions at the end of the block there should be the right successors for each branch. Why is this difficult or bad?
The tail merging threshold is a wart; I've increased it twice and still have hopes of getting rid of it altogether. Last time I looked It fires exactly once in the LLVM test suite.
As for efficiency, I guess we would have to measure it. Many places that make CFG changes can avoid building duplicate edges in the first place, rather than removing them later, and if we do have to remove them it does not take long to walk a 2-element list. There is a cost for creating all those extra edges too. I know the single-edge representation was not a speed problem on another compiler I worked on.
On further consideration I find the 2-edge representation quite illogical. There should be either 1 (for possible targets) or 3 (for possible ways to leave the block, including the fallthrough). What is this representation doing anyway? Do we have one edge for all the conditional branches, which are expected to branch to the same place? Or does the final CBR only have one, because both its destinations are the same?
In any case:
"invalid cfg"? You didn't think so when reviewing 78351, which introduced the simpler CFG, so I'm skeptical this is well defined. There are lots of other places in that area that remove duplicate edges, and IMO that is the way it should be throughout, although right now only BranchFolding does that. It is more efficient, possibly noticeably so in the case of large switches.
Both approaches can be made correct. I prefer the approach of leaving duplicates in the list, because I think it's more efficient to build one SmallPtrSet per basic block considered by tail merging than for lots of places to do linear traversals through the predecessor list to eliminate duplicates. It helps that tail merging already has a threshold so that it ignores blocks with extraordinary numbers of predecessors. And, tail merging can easily be disabled when compile time is at a premium.
The real problem here is that the redundant conditional branches were not removed when the edges were fixed up. That's because AnalyzeBranch is not handling JNE followed by JLE, and there are probably other bugs (missing optimizations at least) lurking that are associated with that. Its comments say:
// If [the branch conditions] differ, see if they fit one of the known
patterns. // Theoretically we could handle more patterns here, but // we shouldn't expect to see them if instruction selection // has done a reasonable job.
so where did JNE followed by JLE come from?
Ultimately, it comes from Instcombine missing an opportunity to fold away a comparison in the original testcase. It's tricky, but Instcombine could be extended to catch it.
The code that accompanies the comment quoted above is conservative; if it doesn't recognize the branching pattern, it returns a return value indicating that the branch could not be analyzed. As such, it is conservatively safe. It's fine for that code to assume that Instcombine can optimize the big cases.
"invalid cfg"? You didn't think so when reviewing 78351, which introduced the simpler CFG, so I'm skeptical this is well defined. There are lots of other places in that area that remove duplicate edges, and IMO that is the way it should be throughout, although right now only BranchFolding does that. It is more efficient, possibly noticeably so in the case of large switches.
The real problem here is that the redundant conditional branches were not removed when the edges were fixed up. That's because AnalyzeBranch is not handling JNE followed by JLE, and there are probably other bugs (missing optimizations at least) lurking that are associated with that. Its comments say:
// If [the branch conditions] differ, see if they fit one of the known patterns.
// Theoretically we could handle more patterns here, but
// we shouldn't expect to see them if instruction selection
// has done a reasonable job.
so where did JNE followed by JLE come from?
This is fixed with r79338.
I spent quite a bit of time making MachineBasicBlock::isOnlyReachableByFallthrough() scan through the branches at the end of a block using various predicates to analyze the branches, the operand list etc. However, this isn't the right fix. The problem is that some pass is turning this:
entry: 0x203bcc4, LLVM BB @​0x1f024d0, ID#0:
%AL<def> = MOV8rm %ESP, 1, %reg0, 4, %reg0, Mem:LD(1,16) [FixedStack-1 + 0]
TEST8rr %AL, %AL, %EFLAGS<imp-def>
JNE mbb<for.cond.preheader,0x203bd64>, %EFLAGS<imp-use>
JLE mbb<for.cond.preheader,0x203bd64>, %EFLAGS<imp-use,kill>
Successors according to CFG: 0x203bd14 (#1) 0x203bd64 (#2)
land.end.i: 0x203bd14, LLVM BB @​0x1f02500, ID#1:
Live Ins: %AL %AL
Predecessors according to CFG: 0x203bcc4 (#0)
Successors according to CFG: 0x203bd64 (#2)
for.cond.preheader: 0x203bd64, LLVM BB @​0x1f02320, ID#2:
into:
entry: 0x203bcc4, LLVM BB @​0x1f024d0, ID#0:
%AL<def> = MOV8rm %ESP, 1, %reg0, 4, %reg0, Mem:LD(1,16) [FixedStack-1 + 0]
TEST8rr %AL, %AL, %EFLAGS<imp-def>
JNE mbb<for.cond.preheader,0x203bd64>, %EFLAGS<imp-use>
JLE mbb<for.cond.preheader,0x203bd64>, %EFLAGS<imp-use,kill>
Successors according to CFG: 0x203bd64 (#1)
for.cond.preheader: 0x203bd64, LLVM BB @​0x1f02320, ID#1:
The problem here is that the "entry" block should have two successor entries to #1. If it did, then the existing code in MachineBasicBlock::isOnlyReachableByFallthrough() would work fine and is in fact correct. It would be really nice for the machine code verifier to detect invalid CFGs like this.
Dan, can you take a look to see what pass is bonking the machinecfg? It would be very good to fix this before 2.6 branches.
The problem is that the code is:
_int321: ## @int321 LBB1_0: ## %entry movb 4(%esp), %al testb %al, %al jne LBB1_1 jle LBB1_1 LBB1_1: ## %for.cond.preheader cmpb $1, %al jg LBB1_3 jmp LBB1_2 .align 4,0x90 LBB1_2: ## %for.cond jmp LBB1_2 LBB1_3: ## %for.end.split ret
but when asm-verbose is disabled, we elide "fall through" block labels, and end up not printing 1_1:
_int321: movb 4(%esp), %al testb %al, %al jne LBB1_1 jle LBB1_1 cmpb $1, %al jg LBB1_3 jmp LBB1_2 .align 4,0x90 LBB1_2: jmp LBB1_2 LBB1_3: ret
The obvious problem is that 1_1 is both fallthrough and an explicit operand.
Extended Description
Seen on Ubuntu Hardy on x86.
regehr@john-home:~/volatile/tmp181$ clang -O3 small.c -o foo /tmp/cc-LeSGPe.o: In function
int321': small.c:(.text+0x8a): undefined reference to
.LBB5_1' small.c:(.text+0x90): undefined reference to `.LBB5_1' collect2: ld returned 1 exit status clang: error: linker command failed with exit code 1 (use -v to see invocation)regehr@john-home:~/volatile/tmp181$ clang -v
clang version 1.0 (http://llvm.org/svn/llvm-project/cfe/trunk 79189) Target: i386-pc-linux-gnu Thread model: posix
regehr@john-home:~/volatile/tmp181$ cat small.c
char foo1 (int _si1, char _si2) { return 0 > 1 || _si1 <= 0 && _si1 < _si2 || 0 ? : 1; }
char foo2 (int _si1, char _si2) { return 0; }
char foo3 (int _si1, char _si2) { return 0 && 1 || _si2 > 0 && (-2147483647 - 1 / _si2) ? : _si2; }
char func_16 (int z, int a) { return 0; }
void int321 (char p_103, int uint8p_104) { char l_106; func_16 (foo1 (p_103 || foo3 (l_106, p_103), 1), 1); for (; p_103 <= 1; foo2(0,0)) { } }
int main (void) { }