sampsyo / bril

an educational compiler intermediate representation
https://capra.cs.cornell.edu/bril/
MIT License
577 stars 238 forks source link

[SSA] Buggy SSA output with `to_ssa.py` and a modified `if-ssa.bril` #108

Open terrelln opened 3 years ago

terrelln commented 3 years ago

This input, which is exactly if-ssa.bril except the .exit label is renamed to .xit causes buggy SSA to be generated by to_ssa.py:

@main(cond: bool) {
.entry:
    a.1: int = const 47;
    br cond .left .right;
.left:
    a.2: int = add a.1 a.1;
    jmp .xit;
.right:
    a.3: int = mul a.1 a.1;
    jmp .xit;
.xit:
    a.4: int = phi .left a.2 .right a.3;
    print a.4;
}

The generated SSA is:

@main(cond: bool) {
.entry:
  a.1.0: int = const 47;
  br cond .left .right;
.left:
  a.2.0: int = add a.1.0 a.1.0;
  jmp .xit;
.right:
  a.3.0: int = mul a.1.0 a.1.0;
  jmp .xit;
.xit:
  a.2.1: int = phi a.2.0 a.2.0 .left .right;
  a.4.0: int = phi a.2.1 a.3.1 .left .right;
  print a.4.0;
  ret;
}

The buggy line is a.4.0: int = phi a.2.1 a.3.1 .left .right;. When called with cond = false the variable a.3.1 is not defined, causing a.4.0 to be undefined.

This is caused by the function prune_phis():

https://github.com/sampsyo/bril/blob/d65455d1eb9c4b78711af2fffc7453bbf205775e/examples/to_ssa.py#L91-L126

This bug is triggered with the new input, and not triggered by if-ssa.bril because of the sorting done on this line:

https://github.com/sampsyo/bril/blob/d65455d1eb9c4b78711af2fffc7453bbf205775e/examples/to_ssa.py#L79

I believe that this optimization is not sound, even in the case when the input does not have phi instructions. Imagine the input:

@main() {
    cond: bool = const true;
    br cond .true .false;
.true:
    a: int = const 0;
    jmp .zexit;
.false:
    b: int = const 1;
    jmp .zexit;
.zexit:
    print a;
}

Because cond is always true, a is always defined. But, to_ssa.py generates incorrect code similar to before:

@main {
.b1:
  cond.0: bool = const true;
  br cond.0 .true .false;
.true:
  a.0: int = const 0;
  jmp .zexit;
.false:
  b.0: int = const 1;
  jmp .zexit;
.zexit:
  b.1: int = phi b.0 b.0 .false .true;
  print a.1;
  ret;
}

You could also argue that variables MUST be defined on all paths for the bril program to be well-formed. But, I don't think that is a reasonable requirement. And instead it should instead be undefined behavior to read from an undefined variable. That leaves the prune_phis optimization as it is currently implemented unsound.

Instead, prune_phis() should probably insert undefined variables like __undefined, or read from the output variable.

sampsyo commented 3 years ago

Aha, that's interesting! You're quite right, of course, that the latter example, should not prune the phi for a.

Anyway, dealing with undefined variables is extremely tricky here! I'd be interested in any straightforward fixes that don't make things any more complicated (it's fine if they generate more useless phis). I'm curious what you're thinking with the two solutions… how would you tell when we should be reading from __undefined or the output variable or whatever?

Maybe there's a way to fix the pruning too. Not sure.

terrelln commented 3 years ago

I’ll put up a PR that fixes the bug, so to_ssa.py generates correct code. Then I’ll think about a different pruning approach.

cgyurgyik commented 3 years ago

One thing that pops out now is that the resulting code after from_ssa is now littered with x: <type> = id __undefined. Given that x is undefined at that point, we should be able to remove it.

To demonstrate on the second example above,

@main {
.b1:
  cond.0: bool = const true;
  br cond.0 .true .false;
.true:
  a.0: int = const 0;
  b.1: int = id b.0; # ???
  a.1: int = id a.0;
  jmp .zexit;
.false:
  b.0: int = const 1;
  b.1: int = id b.0;
  a.1: int = id __undefined; # Remove this instruction, since it should be unreachable.
  jmp .zexit;
.zexit:
  print a.1;
  ret;
}

Maybe there's a reason why this is not a good idea?


Edit: It also looks like there is a bug, since we have the id of a variable that hasn't been defined yet. It originates from the SSA form:

b.1: int = phi b.0 b.0 .false .true; # Should be phi b.0 __undefined .false .true

I think this is along the lines of what you're trying to fix?

Edit2: To build on this a little more, We don't want phi b.0 b.0 .false .true since b is undefined in the .true branch. Rather, we only want to add the variable name if b's definition reaches the given branch. Otherwise, we know the value is undefined at that point.

The fix I propose is on the following line: https://github.com/sampsyo/bril/blob/dbbd8e8f1d7000fefdcffaef2d79ce3339e28b40/examples/to_ssa.py#L75 We append the following condition:

if stack[p] and any(b in out_[s][p] for b in preds_of_block + [block]) # The definition of p reaches the successor for this block.

When we pipe this through from_ssa, it gives us the desired behavior:

@main {
.entry:
  cond.0: bool = const true;
  br cond.0 .true .false;
.true:
  a.0: int = const 0;
  b.1: int = id __undefined; # Good!
  a.1: int = id a.0;
  jmp .zexit;
.false:
  b.0: int = const 1;
  b.1: int = id b.0;
  a.1: int = id __undefined;
  jmp .zexit;
.zexit:
  print a.1;
  ret;
}

I need to do a bit more thinking / some tests, since I'm sure there's some bits I'm missing, such as definitions being in predecessors, but not the immediate block.

sampsyo commented 3 years ago

I think you're on the right track, although I don't know 100% that this will fix the problem. (Maybe it will?) In short, just removing the copies from __undefined themselves won't work because it's necessary but not sufficient: i.e., partial transitive copies from undefined values are still a problem. Testing broadly will be important…

sampsyo commented 3 years ago

I know this issue is old, but I want to write up a summary of where things stand on this sad state of affairs. Basically, writing an SSA conversion in a way that deals elegantly with undefined values turns out to be pretty frustrating. To re-illustrate the basic problem, consider this simple program (called if-const.bril in our tests):

@main() {
    cond: bool = const true;
    br cond .true .false;
.true:
    a: int = const 0;
    jmp .exit;
.false:
    b: int = const 1;
    jmp .exit;
.exit:
    print a;
}

The thing to notice about this program is that it is only safe to run because the branch happens to always go to the .true side. So a is always defined in print a, even though it wouldn't be if the branch were to go the other way (which, again, is impossible).

If we follow our SSA conversion algorithm to the letter, we'd end up first renaming the assignment to a thus:

a.0: int = const 0;

And generating a phi-node that must look something like this in .exit:

a.1: int = phi .true a.0 .false ???;

But what should the value of a.1 be if we traverse the edge from the .false block? The original variable a was undefined, so we need to simulate that!

Our current "solution" is to just replace that ??? with a reference to an actually undefined variable. So we hallucinate the name __undefined and generate a phi-node that looks like this:

a.1: int = phi .true a.0 .false __undefined;

…the idea being that this edge will never actually be traversed, so it should be fine to refer to a variable that doesn't actually exist here. Easy peasy, right?

Not so fast—this works until we try to generate the phi-node for b in .exit:

b.1: int = phi .true __undefined .false b.0;

Oh no! The CFG edge from .true will actually be traversed… so this SSA-ified program will end up referring to an undefined variable and crashing, even though the original program was just fine.

To make this bad scenario more concrete (because running programs directly in SSA form is kinda weird), consider converting this program back out of SSA form, which means inserting assignments into the predecessor blocks. Namely, here, we'll put this into the end of the .true block (remember, that's a block that actually runs):

b.1: int = id __undefined;

The interpreter will, understandably, not like this line.

Again, we have a clever "solution": let's just run dead code elimination! This b.1 variable can't possibly be used. One way to put it is that it is guaranteed that this version of the original b variable was never used downstream of the .true block because, if it were, that original program would have been illegal by virtue of reading an undefined variable itself. So, we can just run our trivial dead code elimination pass to get rid of this pesky undefined variable! And that mostly works.

The only problem is that trivial dead code elimination is trivial. It can't identify more sophisticated situations where these needless generated assignments are inserted. The most dastardly case is where a cycle exists between two generated SSA variables, both of which are dead, but neither of which looks trivially dead in isolation. This comes up if you write a loop that assigns to a variable (that isn't assigned outside of the loop), like in this code from loop-branch.bril:

@loop(infinite: bool, print: bool) {
.entry:
.loop.header:
    br infinite .loop.body .loop.end;
.loop.body:
    br print .loop.print .loop.next;
.loop.print:
    v: int = call @func;
    print v;
.loop.next:
    jmp .loop.header;
.loop.end:
}

In this code, v needs a phi-node at the top of the loop (the .loop.header block). The value for v there needs to come either from the loop itself or from the function entry, in which case it's undefined. And .loop.next needs a phi-node of its own to pick between the call and the previous iteration of the loop, i.e., the phi-node at the top of the loop. Here's our current version of the SSA-ified code:

@loop(infinite: bool, print: bool) {
.entry:
  jmp .loop.header;
.loop.header:
  v.0: int = phi __undefined v.1 .entry .loop.next;
  br infinite .loop.body .loop.end;
.loop.body:
  br print .loop.print .loop.next;
.loop.print:
  v.2: int = call @func;
  print v.2;
  jmp .loop.next;
.loop.next:
  v.1: int = phi v.0 v.2 .loop.body .loop.print;
  jmp .loop.header;
.loop.end:
  ret;
}

The problem here is that, if you insert all the copies to reflect these phi-nodes, the __undefined copy will not be dead. v.0 is used later in v.1, and then v.1 is used in v.0. We have a dead cycle: all the code is dead, but it's "hard" to eliminate the code because it would require detecting cycles (rather than the simple "never used anywhere" check we do for trivial DCE). Annoying!

In #119, @terrelln had the bright idea to just define all these variables. SSA conversion would just insert actual variables like __undefined.int at the top of the function to give zero/null values that could be used by all these undefined cases. That way, we wouldn't be doing the weird hack where we refer to literally undefined variables ever, so code would always be safe from crashing after SSA conversion.

I like this solution well enough, but it has a couple of drawbacks:

So there it currently stands. I would love to find a solution that avoids the above problems while not adding too much complexity to the rest of Bril or to the SSA conversion algorithm itself. (The example is structured this way to closely mirror the idealized pseudocode; crapping it up too much just to deal with this weird edge case of undefined values would be a shame.)

Something in particular that I like about our current (flawed) approach is that it is possible to generate working SSA code with the "naive" SSA conversion algorithm; you don't need to do some sort of fancy "minimal SSA" conversion just to get correct code. It would be awesome to preserve that property in the face of undefined values.

One option I'm considering is redefining the phi instruction to explicitly support cases where the value is undefined. Then, it may be possible to avoid generating copies for undefined cases altogether. This may not work because of the need to transitively eliminate undefined copies, but maybe it will make the undefined values easier to clean up using trivial DCE.

sampsyo commented 3 years ago

Just one small addendum on the history here: our previous "solution" (before the whole __undefined business, which was introduced in #111) was to just allow missing edges in phi-nodes. So in the first example in my previous comment, we would not generate a phi-node like this:

a.1: int = phi .true a.0 .false __undefined;

But rather like this:

a.1: int = phi .true a.0;

That is, we could omit the .false branch entirely. Our SSA-aware interpreter would just crash if it ever executed a phi that failed to handle the relevant just-traversed CFG edge. That crash is fine because traversing that edge would leave a undefined in the original program and crash anyway.

The problem with that approach was again "transitive deadness." When translating back out of SSA form, the (lone) copy that implements this phi-node for a.1 is not a problem. But if there were a later phi-node that uses it, perhaps involved in a cycle like our loop example above, then we may end up inserting copies of variables that don't exist.