[MIR] Consider converting to extended basic blocks

nikomatsakis commented 7 years ago

After some discussion with @stoklund and @eddyb, I was thinking that perhaps it would be better for MIR to transition to supporting extended basic blocks (one entry, multiple exits) instead of the current basic blocks. Concretely, the idea would be to remove the Call, Assert, and Drop terminators and make them into statements with an outgoing panic edge.

Specifically for call, this would simplify things dramatically because the return value is only for use on the "return" edge, and handling that in the current dataflow and so forth is a pain in the neck (this extended basic block form makes that edge more special, which should help). It also helps us have fewer basic blocks.

We could conceivably go further, and remove terminators altogether, and instead just have statements, where the final statement should be of "terminator type". However, I'm somewhat disinclined to do that, at least not right away, because the majority of the advantage of this format derives from converting call/assert/drop (which "feel" linear in code) and not from converting if or match or other control-flow, since almost always those would terminate the current basic block anyway (since the control-flow joins afterwards). Moreover, the current setup assures that we have a terminator in the right spot, and unifying them with statements would mean we have to maintain that invariant ourselves.

cc @rust-lang/compiler @nagisa

jonas-schievink commented 7 years ago

I like that idea. It makes MIR feel similar to the IR used by tracing JITs like LuaJIT, which has guard instructions that abort the current trace if a condition isn't true.

dotdash commented 7 years ago

I'm in favor of this. At first I was reminded of the problems we had (have?) with certain LLVM optimizations when marking &mut arguments as noalias, but that is because there is an implicit unwind edge and as I understand it, we would still have an explicit unwind edge in MIR, right?

Aatch commented 7 years ago

I'm also in favour. Making Call, Assert and Drop special statements would probably be enough, since, as you say, simpler control-flow would likely terminate the current block anyway.

The one negative is that we currently have a 1-to-1 relationship from MIR basic blocks to LLVM basic blocks, but that isn't insurmountable.

nagisa commented 7 years ago

I consider this as an option for https://github.com/rust-lang/rust/issues/32105. On one hand this might make it easier to analyse MIR, on another hand it seems like a very small incremental improvement to me.

Aatch commented 7 years ago

@nagisa the change to how Call is handled is worth it in my opinion. We have to handle it as a special case in a number of places.

nikomatsakis commented 7 years ago

The one negative is that we currently have a 1-to-1 relationship from MIR basic blocks to LLVM basic blocks, but that isn't insurmountable.

To me, this feels pretty unimportant.

eddyb commented 7 years ago

That 1-to-1 relationship is a lie because of the [x; n] rvalue, which creates a loop on the fly.

nikomatsakis commented 7 years ago

I've been thinking about this some. I think the right set of things to include as "terminators" is:

Goto
Resume
Return
Unreachable

The right set of "branching statements" is then:

SwitchInt (with implicit fallthrough)
Drop, DropAndReplace
Call
Assert

To express the concept of an exhaustive switch, we can then have a SwitchInt as the last statement in the block, followed by an Unreachable terminator.

Having SwitchInt always have a fallthrough resolves the current awkward situation where the list of targets is one greater than the list of values without requiring an otherwise: Option<BasicBlock>, which is an annoying discrepancy. I think it also happens to mirror LLVM's setup, but that doesn't seem all that important.

It also allows us to put a "scope end" after the SwitchInt, though only on one branch. In other words:

if <foo> { ... }

would generate something like

BB0:
   temp = <foo>;
   switch temp { 0 => BB1 }
   lifetimeend(temp)
   // if true goes here
   ...
   goto BB2;

BB1:
   // if false
   lifetimeend(temp);
   goto BB2;

BB2:
   ...

Well, ok, after writing this up I'm not as convinced as I was that SwitchInt belongs in the list of "branching statements", but it still seems like a reasonable-ish thing. But I could go either way.

nikomatsakis commented 7 years ago

Heh, I guess we could go one further, and have all only three terminators (return, unreachable, and resume), and then have Goto be a kind of "branching statement" as well. This would mean that the terminators are very simple things that never target other basic blocks or carry data, which has a certain appeal.

nikomatsakis commented 7 years ago

So the more I think about this the more I like my extended variant, where we only have return/unreachable/resume terminators, and all branches that go towards other blocks are statements.

eddyb commented 7 years ago

Can't assign @Mark-Simulacrum but I'll be mentoring them on this, at least for the time being.

nikomatsakis commented 7 years ago

Whatever happened here? I guess @Mark-Simulacrum you got a lot of the way there but didn't have time to push it over the finish line?

eddyb commented 7 years ago

@nikomatsakis It ended up getting pretty messy, with effectively the current BB structure being cached in Mir so that whatever needed it could get it back. I'm less confident it's worth it.

Mark-Simulacrum commented 7 years ago

Yeah, I still have the branch locally (and it's available publicly as well, IIRC). It's not in cache right now, but IIRC the last problem I ran into was the dataflow code not supporting statements which can unwind. I didn't have the time back then to dig in and fix it, but I could probably try now after a rebase. I don't know whether this is something that would be a good idea right now. I feel like dataflow might need to be rewritten for NLL, which might also help make this easier, but I could be wrong.

eddyb commented 7 years ago

Oh, right, dataflow would need one BB-like state per statement which can unwind. That's part of why I became really doubtful about EBBs.

A possibly more feasible approach would be embedding call-like terminators in statements, such that the terminator form always has an unwind target, while the statement form never has one. While that may sound a lot like some previous suggestions, from what I've seen during @Mark-Simulacrum's attempt at EBBs, it should be possible to keep common code paths for them.

arielb1 commented 7 years ago

@eddyb

Why? You don't need a dataflow "variable" for each statement that can unwind, just a dataflow "equation" for every edge out of an EBB. OTOH this does make things more complicated.

eddyb commented 7 years ago

@arielb1 What I mean is that the same amount of state ("equation" I suppose?) that you would store for a whole BB needs to be stored for every statement with an extra (unwind) edge in an EBB, making it quite unappealing - not just more complicated, but less efficient overall to track everything.

steveklabnik commented 6 years ago

Triage: no comments in two years; is this still relevant?

pnkfelix commented 5 years ago

(A discussion about MIR 2.0 is coming up at the all-hands in February 2019 and I imagine that this question about the design of our basic blocks will arise there.)

eddyb commented 5 years ago

I think I have a design that won't cause all the issues we hit before (cc @oli-obk):

Each MIR block gets an (optional?) "cleanup exit" for everything that could panic inside the block, "coalescing" in a sense all the panic cleanup edges.

This takes advantage of the fact that many calls/asserts could be performed without adding or removing cleanups, so they can likely be grouped by the cleanup they use.

The main result is that dataflow only needs to keep 2 final states per block (the regular exit state and the merged cleanup exits' states) instead of N, as with full EBBs (which would've forced us to effectively recover the CFG shape we have today, all the time).

Transformation might be harder than with full EBBs but at least terminators wouldn't have side-effects (since we're left with unconditional jumps and conditional branches, I think?).

bjorn3 commented 4 years ago

Cranelift originally used EBB's but then switched to BB's. Most literature is written for BB's. Converting the algorithms to work on EBB's introduces additional complexity. https://github.com/bytecodealliance/cranelift/issues/796

oli-obk commented 4 years ago

We could add a new Rvalue for invoking guaranteed non-panicking functions (size_of, transmute, ...). This would reduce the number of basic blocks significantly, but I don't know how cranelift or llvm will handle such function calls if we insert actual function calls and don't just generate code.

hanna-kruppe commented 4 years ago

Not sure what problems you expect on the LLVM side, call is an ordinary instruction in LLVM IR, not a terminator. Only when you want to handle exceptions, you use invoke which is a terminator.

bjorn3 commented 4 years ago

In Cranelift and LLVM non-unwinding function calls are regular instructions (called call in both cases). In LLVM only unwinding function calls are terminators (called invoke). If any backend uses terminators for all function calls, it can split the basic block after the call.

nikomatsakis commented 4 years ago

I'd...be reluctant to have two forms of Call, but I guess if we saw significant wins it might be worth it.

nikomatsakis commented 4 years ago

I'm sort of inclined to close this issue though, since I think we're not going to convert to EBBs.

oli-obk commented 4 years ago

Yea, let's close this issue, we can always crate mir optimizations that replace non-panicking intrinsics with some MIR statement logic if that is reasonable to do.

rust-lang / rust

[MIR] Consider converting to extended basic blocks #39685