Closed nikomatsakis closed 4 years ago
I like that idea. It makes MIR feel similar to the IR used by tracing JITs like LuaJIT, which has guard instructions that abort the current trace if a condition isn't true.
I'm in favor of this. At first I was reminded of the problems we had (have?) with certain LLVM optimizations when marking &mut
arguments as noalias
, but that is because there is an implicit unwind edge and as I understand it, we would still have an explicit unwind edge in MIR, right?
I'm also in favour. Making Call
, Assert
and Drop
special statements would probably be enough, since, as you say, simpler control-flow would likely terminate the current block anyway.
The one negative is that we currently have a 1-to-1 relationship from MIR basic blocks to LLVM basic blocks, but that isn't insurmountable.
I consider this as an option for https://github.com/rust-lang/rust/issues/32105. On one hand this might make it easier to analyse MIR, on another hand it seems like a very small incremental improvement to me.
@nagisa the change to how Call
is handled is worth it in my opinion. We have to handle it as a special case in a number of places.
The one negative is that we currently have a 1-to-1 relationship from MIR basic blocks to LLVM basic blocks, but that isn't insurmountable.
To me, this feels pretty unimportant.
That 1-to-1 relationship is a lie because of the [x; n]
rvalue, which creates a loop on the fly.
I've been thinking about this some. I think the right set of things to include as "terminators" is:
The right set of "branching statements" is then:
To express the concept of an exhaustive switch, we can then have a SwitchInt
as the last statement in the block, followed by an Unreachable
terminator.
Having SwitchInt
always have a fallthrough resolves the current awkward situation where the list of targets is one greater than the list of values without requiring an otherwise: Option<BasicBlock>
, which is an annoying discrepancy. I think it also happens to mirror LLVM's setup, but that doesn't seem all that important.
It also allows us to put a "scope end" after the SwitchInt
, though only on one branch. In other words:
if <foo> { ... }
would generate something like
BB0:
temp = <foo>;
switch temp { 0 => BB1 }
lifetimeend(temp)
// if true goes here
...
goto BB2;
BB1:
// if false
lifetimeend(temp);
goto BB2;
BB2:
...
Well, ok, after writing this up I'm not as convinced as I was that SwitchInt
belongs in the list of "branching statements", but it still seems like a reasonable-ish thing. But I could go either way.
Heh, I guess we could go one further, and have all only three terminators (return, unreachable, and resume), and then have Goto
be a kind of "branching statement" as well. This would mean that the terminators are very simple things that never target other basic blocks or carry data, which has a certain appeal.
So the more I think about this the more I like my extended variant, where we only have return/unreachable/resume terminators, and all branches that go towards other blocks are statements.
Can't assign @Mark-Simulacrum but I'll be mentoring them on this, at least for the time being.
Whatever happened here? I guess @Mark-Simulacrum you got a lot of the way there but didn't have time to push it over the finish line?
@nikomatsakis It ended up getting pretty messy, with effectively the current BB structure being cached in Mir
so that whatever needed it could get it back. I'm less confident it's worth it.
Yeah, I still have the branch locally (and it's available publicly as well, IIRC). It's not in cache right now, but IIRC the last problem I ran into was the dataflow code not supporting statements which can unwind. I didn't have the time back then to dig in and fix it, but I could probably try now after a rebase. I don't know whether this is something that would be a good idea right now. I feel like dataflow might need to be rewritten for NLL, which might also help make this easier, but I could be wrong.
Oh, right, dataflow would need one BB-like state per statement which can unwind. That's part of why I became really doubtful about EBBs.
A possibly more feasible approach would be embedding call-like terminators in statements, such that the terminator form always has an unwind target, while the statement form never has one. While that may sound a lot like some previous suggestions, from what I've seen during @Mark-Simulacrum's attempt at EBBs, it should be possible to keep common code paths for them.
@eddyb
Why? You don't need a dataflow "variable" for each statement that can unwind, just a dataflow "equation" for every edge out of an EBB. OTOH this does make things more complicated.
@arielb1 What I mean is that the same amount of state ("equation" I suppose?) that you would store for a whole BB needs to be stored for every statement with an extra (unwind) edge in an EBB, making it quite unappealing - not just more complicated, but less efficient overall to track everything.
Triage: no comments in two years; is this still relevant?
(A discussion about MIR 2.0 is coming up at the all-hands in February 2019 and I imagine that this question about the design of our basic blocks will arise there.)
I think I have a design that won't cause all the issues we hit before (cc @oli-obk):
Each MIR block gets an (optional?) "cleanup exit" for everything that could panic inside the block, "coalescing" in a sense all the panic cleanup edges.
This takes advantage of the fact that many calls/asserts could be performed without adding or removing cleanups, so they can likely be grouped by the cleanup they use.
The main result is that dataflow only needs to keep 2 final states per block (the regular exit state and the merged cleanup exits' states) instead of N, as with full EBBs (which would've forced us to effectively recover the CFG shape we have today, all the time).
Transformation might be harder than with full EBBs but at least terminators wouldn't have side-effects (since we're left with unconditional jumps and conditional branches, I think?).
Cranelift originally used EBB's but then switched to BB's. Most literature is written for BB's. Converting the algorithms to work on EBB's introduces additional complexity. https://github.com/bytecodealliance/cranelift/issues/796
We could add a new Rvalue
for invoking guaranteed non-panicking functions (size_of
, transmute
, ...). This would reduce the number of basic blocks significantly, but I don't know how cranelift or llvm will handle such function calls if we insert actual function calls and don't just generate code.
Not sure what problems you expect on the LLVM side, call
is an ordinary instruction in LLVM IR, not a terminator. Only when you want to handle exceptions, you use invoke
which is a terminator.
In Cranelift and LLVM non-unwinding function calls are regular instructions (called call
in both cases). In LLVM only unwinding function calls are terminators (called invoke
). If any backend uses terminators for all function calls, it can split the basic block after the call.
I'd...be reluctant to have two forms of Call, but I guess if we saw significant wins it might be worth it.
I'm sort of inclined to close this issue though, since I think we're not going to convert to EBBs.
Yea, let's close this issue, we can always crate mir optimizations that replace non-panicking intrinsics with some MIR statement logic if that is reasonable to do.
After some discussion with @stoklund and @eddyb, I was thinking that perhaps it would be better for MIR to transition to supporting extended basic blocks (one entry, multiple exits) instead of the current basic blocks. Concretely, the idea would be to remove the
Call
,Assert
, andDrop
terminators and make them into statements with an outgoing panic edge.Specifically for call, this would simplify things dramatically because the return value is only for use on the "return" edge, and handling that in the current dataflow and so forth is a pain in the neck (this extended basic block form makes that edge more special, which should help). It also helps us have fewer basic blocks.
We could conceivably go further, and remove terminators altogether, and instead just have statements, where the final statement should be of "terminator type". However, I'm somewhat disinclined to do that, at least not right away, because the majority of the advantage of this format derives from converting call/assert/drop (which "feel" linear in code) and not from converting
if
ormatch
or other control-flow, since almost always those would terminate the current basic block anyway (since the control-flow joins afterwards). Moreover, the current setup assures that we have a terminator in the right spot, and unifying them with statements would mean we have to maintain that invariant ourselves.cc @rust-lang/compiler @nagisa