codegen pass breaks machine cfg

regehr commented 15 years ago


Bugzilla Link	4732
Resolution	FIXED
Resolved on	Aug 17, 2010 20:01
Version	1.0
OS	Linux
CC	@lattner,@sunfishcode

Extended Description

Seen on Ubuntu Hardy on x86.

regehr@john-home:~/volatile/tmp181$ clang -O3 small.c -o foo /tmp/cc-LeSGPe.o: In function int321': small.c:(.text+0x8a): undefined reference to.LBB5_1' small.c:(.text+0x90): undefined reference to `.LBB5_1' collect2: ld returned 1 exit status clang: error: linker command failed with exit code 1 (use -v to see invocation)

regehr@john-home:~/volatile/tmp181$ clang -v

clang version 1.0 (http://llvm.org/svn/llvm-project/cfe/trunk 79189) Target: i386-pc-linux-gnu Thread model: posix

regehr@john-home:~/volatile/tmp181$ cat small.c

char foo1 (int _si1, char _si2) { return 0 > 1 || _si1 <= 0 && _si1 < _si2 || 0 ? : 1; }

char foo2 (int _si1, char _si2) { return 0; }

char foo3 (int _si1, char _si2) { return 0 && 1 || _si2 > 0 && (-2147483647 - 1 / _si2) ? : _si2; }

char func_16 (int z, int a) { return 0; }

void int321 (char p_103, int uint8p_104) { char l_106; func_16 (foo1 (p_103 || foo3 (l_106, p_103), 1), 1); for (; p_103 <= 1; foo2(0,0)) { } }

int main (void) { }

regehr commented 2 years ago

mentioned in issue llvm/llvm-bugzilla-archive#5459

llvmbot commented 14 years ago

My recollection is that I posted something to the list but there was no agreement on the right answers or what the rules should be. I guess the current state is OK. The testcase now works with both llvm-gcc and clang at all optimization levels, and produces much better code than shown here. Might as well close it.

lattner commented 14 years ago

What's happening with this?

llvmbot commented 15 years ago

Ah OK, I didn't understand that. I'll write something up.

lattner commented 15 years ago

The rules are not defined, I was suggesting that you propose a set of rules :)

llvmbot commented 15 years ago

If you mean write something that describes the rules, I'll need answers to the questions above before I can do that. Don't put them in bugzilla if you don't want to:)

lattner commented 15 years ago

That sounds great! Can you write something up and send it to llvmdev? I don't think this bugzilla is the right place to discuss it :)

llvmbot commented 15 years ago

Certainly there's more than one way to do this; I think a representation of one edge per target is better mostly because it uses less memory, but as Dan points out it is not one-sided, and I'm not trying to insist on it. What bothers me more is that I'm not sure the rules for the CFG are consistent and fully formed even in your head, let alone in docs. The BranchFolding code has always removed redundant edges, you have reviewed that code, and you haven't had a problem with it until now. Also, the CFG you say you want for the block in question, jne LBB1_1 jle LBB1_1 LBB1_1:
doesn't conform to your simple rule; it should have 3 edges not 2 according to your statement of the rule. Let me go through some examples to try to get the rules straight.

How many edges for the example above? jne LBB1_2 jle LBB1_3 LBB1_1: This one must have 3 edges. If the answer to the previous block is 2, why should this be different? If this is invalid, why? jne LBB1_1 jle LBB1_2 LBB1_1: Presumably 3 edges? jge LBB1_1 jne LBB1_1 jle LBB1_1 LBB1_1:
Is this valid? If so how many edges, if not why not? jge LBB1_1 jne LBB1_2 jle LBB1_1 LBB1_2:
Is this valid? switch LBB1_1, [LBB1_1, LBB1_1] LBB1_1: This is another way to express the block that's causing all the trouble; transformations might reasonably turn one into the other. How many edges?

My intent here, at the moment, is to figure out what the rules are and then document them.

lattner commented 15 years ago

I completely agree that we need better documentation about what the machinecfg invariants are though. Also maybe the machine code verifier could catch some of this stuff?

lattner commented 15 years ago

I disagree in a couple of ways. Yes, it is bad that the code gen is not cleaning this up and eliminating the branches all together. However, the CFG has to be well defined even if codegen fails to do this. If we end up with something silly like:

add jne LBB1 jmp LBB42 LBB1:

The cfg should still be well formed. I think that the CFG should be formed with simple rules: if a block falls through, there is one edge to represent the fall through. In any case, regardless of fallthrough, if there are other branch instructions at the end of the block there should be the right successors for each branch. Why is this difficult or bad?

llvmbot commented 15 years ago

The tail merging threshold is a wart; I've increased it twice and still have hopes of getting rid of it altogether. Last time I looked It fires exactly once in the LLVM test suite.

As for efficiency, I guess we would have to measure it. Many places that make CFG changes can avoid building duplicate edges in the first place, rather than removing them later, and if we do have to remove them it does not take long to walk a 2-element list. There is a cost for creating all those extra edges too. I know the single-edge representation was not a speed problem on another compiler I worked on.

On further consideration I find the 2-edge representation quite illogical. There should be either 1 (for possible targets) or 3 (for possible ways to leave the block, including the fallthrough). What is this representation doing anyway? Do we have one edge for all the conditional branches, which are expected to branch to the same place? Or does the final CBR only have one, because both its destinations are the same?

In any case:

The form of the CFG we have defined to be correct really needs to be documented. (A possible definition is to leave it indeterminate, in which case code that depends on either representation is wrong. That is, in effect, how it is now, and I'm OK with it, but I bet other people won't be.)
If we do define some number of edges to be correct, the verifiers should check this.
If the approach of your patch is defined to be correct, the whole problem isn't fixed; there are other places in that area that remove duplicate edges.
The generated code looks ridiculous. Somebody should get rid of those branches (or, preferably IMO, not create a block with 2 conditional branches). You seem to be saying both passes are working correctly and this is OK; I don't agree with that. I suppose one way to do this is split such blocks in the branch folder; that's probably the easiest fix, but I'd rather see that block defined as invalid input.

sunfishcode commented 15 years ago

"invalid cfg"? You didn't think so when reviewing 78351, which introduced the simpler CFG, so I'm skeptical this is well defined. There are lots of other places in that area that remove duplicate edges, and IMO that is the way it should be throughout, although right now only BranchFolding does that. It is more efficient, possibly noticeably so in the case of large switches.

Both approaches can be made correct. I prefer the approach of leaving duplicates in the list, because I think it's more efficient to build one SmallPtrSet per basic block considered by tail merging than for lots of places to do linear traversals through the predecessor list to eliminate duplicates. It helps that tail merging already has a threshold so that it ignores blocks with extraordinary numbers of predecessors. And, tail merging can easily be disabled when compile time is at a premium.

The real problem here is that the redundant conditional branches were not removed when the edges were fixed up. That's because AnalyzeBranch is not handling JNE followed by JLE, and there are probably other bugs (missing optimizations at least) lurking that are associated with that. Its comments say:
// If [the branch conditions] differ, see if they fit one of the known
patterns. // Theoretically we could handle more patterns here, but // we shouldn't expect to see them if instruction selection // has done a reasonable job.

so where did JNE followed by JLE come from?

Ultimately, it comes from Instcombine missing an opportunity to fold away a comparison in the original testcase. It's tricky, but Instcombine could be extended to catch it.

The code that accompanies the comment quoted above is conservative; if it doesn't recognize the branching pattern, it returns a return value indicating that the branch could not be analyzed. As such, it is conservatively safe. It's fine for that code to assume that Instcombine can optimize the big cases.

llvmbot commented 15 years ago

"invalid cfg"? You didn't think so when reviewing 78351, which introduced the simpler CFG, so I'm skeptical this is well defined. There are lots of other places in that area that remove duplicate edges, and IMO that is the way it should be throughout, although right now only BranchFolding does that. It is more efficient, possibly noticeably so in the case of large switches.

The real problem here is that the redundant conditional branches were not removed when the edges were fixed up. That's because AnalyzeBranch is not handling JNE followed by JLE, and there are probably other bugs (missing optimizations at least) lurking that are associated with that. Its comments say:

// If [the branch conditions] differ, see if they fit one of the known patterns.
// Theoretically we could handle more patterns here, but
// we shouldn't expect to see them if instruction selection
// has done a reasonable job.

so where did JNE followed by JLE come from?

sunfishcode commented 15 years ago

This is fixed with r79338.

lattner commented 15 years ago

I spent quite a bit of time making MachineBasicBlock::isOnlyReachableByFallthrough() scan through the branches at the end of a block using various predicates to analyze the branches, the operand list etc. However, this isn't the right fix. The problem is that some pass is turning this:

entry: 0x203bcc4, LLVM BB @&#8203;0x1f024d0, ID#0:
        %AL<def> = MOV8rm %ESP, 1, %reg0, 4, %reg0, Mem:LD(1,16) [FixedStack-1 + 0]
        TEST8rr %AL, %AL, %EFLAGS<imp-def>
        JNE mbb<for.cond.preheader,0x203bd64>, %EFLAGS<imp-use>
        JLE mbb<for.cond.preheader,0x203bd64>, %EFLAGS<imp-use,kill>
    Successors according to CFG: 0x203bd14 (#1) 0x203bd64 (#2)

land.end.i: 0x203bd14, LLVM BB @&#8203;0x1f02500, ID#1:
Live Ins: %AL %AL
    Predecessors according to CFG: 0x203bcc4 (#0)
    Successors according to CFG: 0x203bd64 (#2)

for.cond.preheader: 0x203bd64, LLVM BB @&#8203;0x1f02320, ID#2:

into:

entry: 0x203bcc4, LLVM BB @&#8203;0x1f024d0, ID#0:
        %AL<def> = MOV8rm %ESP, 1, %reg0, 4, %reg0, Mem:LD(1,16) [FixedStack-1 + 0]
        TEST8rr %AL, %AL, %EFLAGS<imp-def>
        JNE mbb<for.cond.preheader,0x203bd64>, %EFLAGS<imp-use>
        JLE mbb<for.cond.preheader,0x203bd64>, %EFLAGS<imp-use,kill>
    Successors according to CFG: 0x203bd64 (#1)

for.cond.preheader: 0x203bd64, LLVM BB @&#8203;0x1f02320, ID#1:

The problem here is that the "entry" block should have two successor entries to #1. If it did, then the existing code in MachineBasicBlock::isOnlyReachableByFallthrough() would work fine and is in fact correct. It would be really nice for the machine code verifier to detect invalid CFGs like this.

Dan, can you take a look to see what pass is bonking the machinecfg? It would be very good to fix this before 2.6 branches.

lattner commented 15 years ago

The problem is that the code is:

_int321: ## @int321 LBB1_0: ## %entry movb 4(%esp), %al testb %al, %al jne LBB1_1 jle LBB1_1 LBB1_1: ## %for.cond.preheader cmpb $1, %al jg LBB1_3 jmp LBB1_2 .align 4,0x90 LBB1_2: ## %for.cond jmp LBB1_2 LBB1_3: ## %for.end.split ret

but when asm-verbose is disabled, we elide "fall through" block labels, and end up not printing 1_1:

_int321: movb 4(%esp), %al testb %al, %al jne LBB1_1 jle LBB1_1 cmpb $1, %al jg LBB1_3 jmp LBB1_2 .align 4,0x90 LBB1_2: jmp LBB1_2 LBB1_3: ret

The obvious problem is that 1_1 is both fallthrough and an explicit operand.

lattner commented 15 years ago

testcase, repros with llc -asm-verbose=0

llvm / llvm-project

codegen pass breaks machine cfg #5104

Extended Description