llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.62k stars 11.83k forks source link

failed assertion in Live Variable Analysis with callbr w/ outputs #44910

Closed nickdesaulniers closed 4 years ago

nickdesaulniers commented 4 years ago
Bugzilla Link 45565
Resolution FIXED
Resolved on Apr 27, 2020 18:11
Version trunk
OS All
Blocks llvm/llvm-project#4440
CC @efriedma-quic,@isanbard,@jyknight,@rnk,@stephenhines

Extended Description

llc -O2 futex.ll:

@​c = dso_local local_unnamed_addr global i32 0, align 4                                                                              
@​a = dso_local local_unnamed_addr global i32 0, align 4

define dso_local i32 @​b(i32* nocapture %f) #​0 {
entry:
  %0 = callbr i32 asm "", "=r,X,~{dirflag},~{fpsr},~{flags}"(i8* blockaddress(@b, %e))
          to label %asm.fallthrough [label %e]

asm.fallthrough:                                  ; preds = %entry
  store i32 %0, i32* %f, align 4
  br label %e

e:                                                ; preds = %asm.fallthrough, %entry
  ret i32 undef
}

define dso_local i32 @​futex_lock_pi_atomic() local_unnamed_addr {
entry:
  %0 = callbr i32 asm "", "=r,X,~{dirflag},~{fpsr},~{flags}"(i8* blockaddress(@futex_lock_pi_atomic, %b.exit)) #​2
          to label %asm.fallthrough.i [label %b.exit]

asm.fallthrough.i:                                ; preds = %entry
  br label %b.exit

b.exit:                                           ; preds = %entry, %asm.fallthrough.i
  %1 = load i32, i32* @​c, align 4
  %2 = tail call { i32, i32 } asm "", "=r,={ax},1,0,~{dirflag},~{fpsr},~{flags}"(i32 %0, i32 %1)
  %asmresult = extractvalue { i32, i32 } %2, 0
  %asmresult1 = extractvalue { i32, i32 } %2, 1
  store i32 %asmresult, i32* @​c, align 4
  store i32 %asmresult1, i32* @​a, align 4
  ret i32 undef
}

llc: ../lib/CodeGen/LiveVariables.cpp:114: void llvm::LiveVariables::MarkVirtRegAliveInBlock(llvm::LiveVariables::VarInfo &, llvm::MachineBasicBlock , llvm::MachineBasicBlock , std::vector<MachineBasicBlock *> &): Assertion `MBB != &MF->front() && "Can't find reaching def for virtreg"' failed.

  1. Running pass 'Function Pass Manager' on module 'futex.ll'.
  2. Running pass 'Live Variable Analysis' on function '@futex_lock_pi_atomic' ...

    ​9 0x00000000035d4d1e llvm::LiveVariables::HandleVirtRegUse(unsigned int, llvm::MachineBasicBlock*, llvm::MachineInstr&) (/android0/llvm-project/llvm/build/bin/llc+0x35d4d1e)

    ​10 0x00000000035d746e llvm::LiveVariables::runOnInstr(llvm::MachineInstr&, llvm::SmallVectorImpl&) (/android0/llvm-project/llvm/build/bin/llc+0x35d746e)

    ​11 0x00000000035d7a4c llvm::LiveVariables::runOnBlock(llvm::MachineBasicBlock*, unsigned int) (/android0/llvm-project/llvm/build/bin/llc+0x35d7a4c)

    ​12 0x00000000035d82a1 llvm::LiveVariables::runOnMachineFunction(llvm::MachineFunction&) (/android0/llvm-project/llvm/build/bin/llc+0x35d82a1)

    ​13 0x000000000363fd0a llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (/android0/llvm-project/llvm/build/bin/llc+0x363fd0a)

    ​14 0x0000000003a1cc74 llvm::FPPassManager::runOnFunction(llvm::Function&) (/android0/llvm-project/llvm/build/bin/llc+0x3a1cc74)

    ​15 0x0000000003a1cf28 llvm::FPPassManager::runOnModule(llvm::Module&) (/android0/llvm-project/llvm/build/bin/llc+0x3a1cf28)

    ​16 0x0000000003a1d57d llvm::legacy::PassManagerImpl::run(llvm::Module&) (/android0/llvm-project/llvm/build/bin/llc+0x3a1d57d)

    ​17 0x00000000022b9618 main (/android0/llvm-project/llvm/build/bin/llc+0x22b9618)

    ...

jyknight commented 4 years ago

Fixed (including fix for last comment) in http://github.com/llvm/llvm-project/commit/248a5db3f2e5ff8e4c37b9578bfda8c4d97eaedc

nickdesaulniers commented 4 years ago

D78341 with the test case from comment 5 produces IR that RNK describes in comment 9, but I'm not really sure what to do about:

$ clang -O2 testcase.i Instruction does not dominate all uses! %0 = callbr i32 asm "", "=r,X,~{dirflag},~{fpsr},~{flags}"(i8* blockaddress(@futex_lock_pi_atomic, %e)) #​1 to label %asm.fallthrough [label %e], !srcloc !​2 ret i32 %0 in function futex_lock_pi_atomic fatal error: error in backend: Broken function found, compilation aborted! clang-11: error: clang frontend command failed with exit code 70 (use -v to see invocation) clang version 11.0.0 (git@github.com:llvm/llvm-project.git 2cef71cac9f91a0e0623994b7fa6f930c10e0dd6) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /android0/llvm-project/llvm/build/bin clang-11: note: diagnostic msg: Error generating preprocessed source(s) - no preprocessable inputs.

I assume there's more to do here?

jyknight commented 4 years ago

I should note it fixes the issue by making the IR given here invalid ("Instruction does not dominate all uses!")

But that also causes the C code not optimize down to the invalid IR (via triggering the creation of a "phi i32 [ %0, %asm.fallthrough ], [ undef, %entry ]")

jyknight commented 4 years ago

Uploaded https://reviews.llvm.org/D78341 -- seems like that should be a simple-enough fix?

efriedma-quic commented 4 years ago

My suggestion is to keep the LLVM IR the same. Then at isel time, insert a definition of each of the outputs into each of the successors of the callbr, then do SSA construction to map the LLVM IR uses to a use of the appropriate vreg. (See SelectionDAGBuilder::getCopyFromRegs etc. for the code that normally maps LLVM IR uses to virtual registers.)

Whether the definitions in the indirect destinations are undef, or actual copies of the outputs, is your choice; probably actual copies is more complicated because you need to do critical edge splitting, in general. (It's not completely clear to me whether critical edge splitting is actually legal, due to the mess we've made with callbr and blockaddress at the IR level.)


The alternative is to do some substantial surgery to the IR representation, so that the values don't exist along the indirect edges. Then IR optimizations would insert appropriate PHI nodes automatically. Reid outlined various different approaches to actually representing that in IR. But it sounds like you don't want to do that, if the long-term plan is to allow such definitions.

--

I can't think of any other approach makes sense. Maybe there's some variation of "at isel time" that involves manipulating the IR just before isel, instead of actually implementing code in isel itself.

nickdesaulniers commented 4 years ago

From LangRef: "For the purposes of the SSA form, the definition of the value returned by the ‘invoke’ instruction is deemed to occur on the edge from the current block to the “normal” label. If the callee unwinds then no return value is available."

Finishing my half thought out comment 7...

The issue we're now running into is exactly RNK's point in comment 9:

In other words, without the phi, we have a situation where not all uses are dominated by defs.

It sounds like you'd either:

But anyway, if the MIR rules for INLINEASM_BR are different, isel needs to insert a PHI node.

I think this is what Eli is advocating for; to use a phi with explicit undefs. Then to change the langref to mention undef rather than poison? Is that a precise understanding of your suggestions, Eli?

Though it sounds like there's disagreement about whether the phi should be in LLVM IR (rnk) or MIR (Eli)? I think Eli meant LLVM IR?

I remember asking Bill during the RFC process if we'd need to insert PHIs, and I think there was a reason we originally did not, but I both can no longer recall or find the discussion. I also wasn't very well versed in the possible problems the implementation might run into (welcome to engineering, I suppose). RFC: https://lists.llvm.org/pipermail/llvm-dev/2019-June/133428.html

or

Basically, the inline asm ends the basic block, and we should copy the result out of all destination registers used by the asm in the successor blocks. Otherwise, we are not in Machine SSA form.

I'm not sure if those are one in the same suggestion, or two different mutually exclusive approaches?

But, now we have a mismatch between LLVM IR SSA and Machine IR SSA: we have all of these virtual registers defined in each callbr successor, but there isn't a one to one mapping from IR Value to vreg. Yuck.

Sounds like that could be an approach if we did eventually want to support outputs on the indirect targets? It is technically possible to have inline asm that always assigns to an output before conditionally jumping to an indirect target, where the output was in scope. (When the output is not in scope, we get traditional scope based identifier resolution errors anyways). So I don't think it's impossible to support. More so something to limit the Minimum Viable Product / v1, and since it's been less demanded than just the fallthrough case. And maybe YAGNI.

There's already not a 1:1 mapping between MachineBasicBlock and BasicBlock when the BasicBlock is terminated by a CallBrInst, due to block splitting, so already it gets weird.

Or we could get rid of phis all together and move to basic block arguments. That would be pretty wild, though.

That's a yak shave I'd really rather not do to solve this issue, as it will surely take much longer than we'd ideally like in order to solve this issue.

I will note that Chris Lattner recently said "Modeling PHI nodes as part of the instruction stream was a fundamental design mistake." https://docs.google.com/presentation/d/11-VjSNNNJoRhPlLxFgvtb909it1WNdxTnQFipryfAPU/edit#slide=id.g7d9ae7afef_11_358 see slides 79-81 (I think this exact problem or one very similar is documented on slide 81).

Finally, hip new frontends like SIL and backends like Cranelift (Dan Gohman) also avoid PHIs in favor of basic block arguments. https://llvm.org/devmtg/2015-10/slides/GroffLattner-SILHighLevelIR.pdf slides 57-68 https://youtu.be/Ntj8ab-5cvE?t=596 https://cranelift.readthedocs.io/en/stable/compare-llvm.html#program-structure

I would love to see more MLIR used in LLVM, and would love to help with that someday. But not today.

efriedma-quic commented 4 years ago

But, now we have a mismatch between LLVM IR SSA and Machine IR SSA: we have all of these virtual registers defined in each callbr successor, but there isn't a one to one mapping from IR Value to vreg. Yuck.

Yes, doing SSA construction during isel is ugly. But we can probably make it work.

Thinking about it a bit more, the end result might be better for users: we could get rid of the confusing "Use of outputs on the ‘indirect’ path(s) results in poison values" clause in LangRef.

rnk commented 4 years ago

Hm, right. I guess I would've expected to see this IR:

define dso_local i32 @​futex_lock_pi_atomic() #​0 { entry: %0 = callbr i32 asm "", "=r,X,~{dirflag},~{fpsr},~{flags}"(i8* blockaddress(@futex_lock_pi_atomic, %e)) #​1 to label %asm.fallthrough [label %e], !srcloc !​3

asm.fallthrough: ; preds = %entry br label %e

e: %rv = phi i32 [undef, %entry], [%0, %asm.fallthrough] ret i32 %0 }

In other words, without the phi, we have a situation where not all uses are dominated by defs.

However, I recall that we made the choice to make callbr different from invoke. We decided to model the SSA value as coming from the block ending in callbr.


Clearly, we want the following Machine IR:

CALLBR INLINEASM

end bb

bb1.fallthrough: COPY %EAX -> %vreg0 ...

bb2.indirect1: COPY %EAX -> %vreg1 ...

bb2.indirect2: COPY %EAX -> %vreg2 ...

Basically, the inline asm ends the basic block, and we should copy the result out of all destination registers used by the asm in the successor blocks. Otherwise, we are not in Machine SSA form.

But, now we have a mismatch between LLVM IR SSA and Machine IR SSA: we have all of these virtual registers defined in each callbr successor, but there isn't a one to one mapping from IR Value to vreg. Yuck.


We could copy the design of landingpad, and invent some kind of "callbrpad" instruction to produce the value in IR.

Or we could get rid of phis all together and move to basic block arguments. That would be pretty wild, though.

nickdesaulniers commented 4 years ago

Disregard comment 7, was half thought out/typed out and I just wanted to cc rnk@.

nickdesaulniers commented 4 years ago

From LangRef: "For the purposes of the SSA form, the definition of the value returned by the ‘invoke’ instruction is deemed to occur on the edge from the current block to the “normal” label. If the callee unwinds then no return value is available."

We can't say "for callbr, no return value is available along the indirect destination edges" since it's possible the indirect destination is dominated by the fallthrough, ie. case in comment 5.

What would we even do for comment 5's example? Just fail codegen?

The semantics are more like "output is always available on the fallthrough, even if the

efriedma-quic commented 4 years ago

From LangRef: "For the purposes of the SSA form, the definition of the value returned by the ‘invoke’ instruction is deemed to occur on the edge from the current block to the “normal” label. If the callee unwinds then no return value is available."

nickdesaulniers commented 4 years ago

LangRef currently says "Use of outputs on the ‘indirect’ path(s) results in poison values". Really, I think it's a terrible idea that this doesn't work like "invoke". But anyway, if the MIR rules for INLINEASM_BR are different, isel needs to insert a PHI node.

Indeed, this case is "what happens when fallthrough and the indirect target are one in the same. Simplified further:

int futex_lock_pi_atomic(void) { int d; asm goto("" : "=r"(d) : : : e); e: return d; }

and it sounds like the answer might be "phi with poison from the indirect branch."

I'm not super familiar with invoke, can you expand more on in what ways callbr should act like invoke?

nickdesaulniers commented 4 years ago

simplified further:

void futex_lock_pi_atomic(void) {                                                                                                    
  int a, d;
  asm goto("" : "=r"(d) : : : e);
e:
  asm volatile ("": "=a"(a) : "0"(d));
}

I had to lookup what the "0" constraint meant: https://gcc.gnu.org/onlinedocs/gcc/Simple-Constraints.html#Simple-Constraints grep "matching constraint".

efriedma-quic commented 4 years ago

LangRef currently says "Use of outputs on the ‘indirect’ path(s) results in poison values". Really, I think it's a terrible idea that this doesn't work like "invoke". But anyway, if the MIR rules for INLINEASM_BR are different, isel needs to insert a PHI node.

nickdesaulniers commented 4 years ago

Finally, the C code to reproduce this (via creduce):

a, c;
b(*f) {
  __typeof__(0) d;
  asm goto("" : "=r"(d) : : : e);
  *f = d;
e:;
}
futex_lock_pi_atomic(void) {
  int g;
  b(&g);
  asm("" : "+r"(c), "=a"(a) : "1"(g));
}
nickdesaulniers commented 4 years ago

Sorry, this case is more concise:

@&#8203;c = global i32 0                                                                                                                    
@&#8203;a = global i32 0

define dso_local void @&#8203;futex_lock_pi_atomic() local_unnamed_addr {
entry:
  %0 = callbr i32 asm "", "=r,X,~{dirflag},~{fpsr},~{flags}"(i8* blockaddress(@futex_lock_pi_atomic, %b.exit))
          to label %asm.fallthrough.i [label %b.exit]

asm.fallthrough.i:                                ; preds = %entry
  br label %b.exit

b.exit:                                           ; preds = %entry, %asm.fallthrough.i
  %1 = load i32, i32* @&#8203;c, align 4
  %2 = tail call { i32, i32 } asm "", "=r,={ax},1,0,~{dirflag},~{fpsr},~{flags}"(i32 %0, i32 %1)
  %asmresult = extractvalue { i32, i32 } %2, 0
  %asmresult1 = extractvalue { i32, i32 } %2, 1
  store i32 %asmresult, i32* @&#8203;c, align 4
  store i32 %asmresult1, i32* @&#8203;a, align 4
  ret void
}

It looks like %0 (the outputs of the asm goto) are being used in the indirect target, which looks like it came from inlining @​b into @​futex_lock_pi_atomic (from my bug Description). I guess the second asm block in @​futex_lock_pi_atomic should be using a phi with undef if we entered via entry BasicBlock?