llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.09k stars 11.6k forks source link

Bad codeview (maybe others) debug info generated after the fastregalloc pass inserts instruction without debug information #106419

Open TheJCAB opened 3 weeks ago

TheJCAB commented 3 weeks ago

This was found via Rust. Just compile this function without optimizations for x64 Windows:

pub fn get_int(input: u32) -> Box<u32> {
    let map = match A::B(input) {
        A::B(file) => {
            let mut contents = Box::new(5);
            *contents += file;
            contents
        },
        A::C => Box::new(42),
    };

    return map;
}

IR looks like this:

  store i32 5, ptr %x.dbg.spill.i3, align 4
    #dbg_declare(ptr %x.dbg.spill.i3, !1377, !DIExpression(), !1381)
; invoke alloc::alloc::exchange_malloc
  %_4.i4 = invoke ptr @_ZN5alloc5alloc15exchange_malloc17h9b5ed8410e7151ebE(i64 4, i64 4)
          to label %"_ZN5alloc5boxed12Box$LT$T$GT$3new17hb2f5d094a153b235E.exit7" unwind label %funclet_bb2.i5, !dbg !1383

!1381 = !DILocation(line: 255, scope: !1378, inlinedAt: !1382)
!1382 = distinct !DILocation(line: 9, scope: !1368) ; Inline callsite for Box::new(5)
!1383 = !DILocation(line: 257, scope: !1378, inlinedAt: !1382)

The relevant part is the "invoke" instruction. Note that it is force-inlined (by Rust's Box::new()), but I don't think that's relevant to this issue (I'll share if I reduce the repro further).

When single-stepping in the debugger to the let mut contents = Box::new(5); line, the file local variable will not show as alive, even though it should.

This snippet from ("llc -O0 main.bc -print-after-all") sheds some light:

  $rcx = COPY %15:gr64, debug-location !42; X:\repos\rust\library\alloc\src\boxed.rs:257 @[ src/main.rs:9 ]
  $rdx = COPY %15:gr64, debug-location !42; X:\repos\rust\library\alloc\src\boxed.rs:257 @[ src/main.rs:9 ]
  CALL64pcrel32 @_ZN5alloc5alloc15exchange_malloc17h9b5ed8410e7151ebE, <regmask $bh $bl blah blah...>, implicit $rsp, implicit blah blah, debug-location !42; X:\repos\rust\library\alloc\src\boxed.rs:257 @[ src/main.rs:9 ]
...
# End machine code for function _ZN18MatchVariableRepro7get_int17h778e1dfd4fe19ba1E.

# *** IR Dump After Fast Register Allocator (regallocfast) ***:
# Machine code for function _ZN18MatchVariableRepro7get_int17h778e1dfd4fe19ba1E: NoPHIs, TracksLiveness, NoVRegs, TiedOpsRewritten
...
  renamable $rdx = MOV32ri64 4
  $rcx = COPY renamable $rdx, debug-location !42; X:\repos\rust\library\alloc\src\boxed.rs:257 @[ src/main.rs:9 ]
  CALL64pcrel32 @_ZN5alloc5alloc15exchange_malloc17h9b5ed8410e7151ebE, <regmask $bh $bl blah blah...>, implicit $rsp, implicit blah blah, debug-location !42; X:\repos\rust\library\alloc\src\boxed.rs:257 @[ src/main.rs:9 ]

"renamable $rdx = MOV32ri64 4" is a new instruction introduced by this pass, and it is introduced without debug info. This is... reasonable, I guess, since it's new. It's just loading the constant "4" into RDX (to be used as size and alignment for the allocation, I believe). Technically, this instruction, and the subsequent one, still should belong to the same scope as the two instructions that they replace. And sure enough, they both end up marked as belonging to the same line of code.

But later, when the lifetime of the file local variable is determined, that generated first instruction is not included the ranges, causing the issue at hand.

I'm not sure where the core error lies (I don't know LLVM well enough). It could be in the optimization (the new instruction is generated from actual code, so it should have debug info). Or it could be in the later determination of the lifetime of the local variable, where the list of instruction ranges is determined.

I suspect the former is... arguable against because the new instruction is just "loading a constant" that could then be reused in a number of ways. But this is unoptimized code generation, so...

llvmbot commented 3 weeks ago

@llvm/issue-subscribers-debuginfo

Author: Juan Carlos Arevalo Baeza (TheJCAB)

This was found via Rust. Just compile this function *without optimizations* for x64 Windows: ```Rust pub fn get_int(input: u32) -> Box<u32> { let map = match A::B(input) { A::B(file) => { let mut contents = Box::new(5); *contents += file; contents }, A::C => Box::new(42), }; return map; } ``` IR looks like this: ```LL store i32 5, ptr %x.dbg.spill.i3, align 4 #dbg_declare(ptr %x.dbg.spill.i3, !1377, !DIExpression(), !1381) ; invoke alloc::alloc::exchange_malloc %_4.i4 = invoke ptr @_ZN5alloc5alloc15exchange_malloc17h9b5ed8410e7151ebE(i64 4, i64 4) to label %"_ZN5alloc5boxed12Box$LT$T$GT$3new17hb2f5d094a153b235E.exit7" unwind label %funclet_bb2.i5, !dbg !1383 !1381 = !DILocation(line: 255, scope: !1378, inlinedAt: !1382) !1382 = distinct !DILocation(line: 9, scope: !1368) ; Inline callsite for Box::new(5) !1383 = !DILocation(line: 257, scope: !1378, inlinedAt: !1382) ``` The relevant part is the "invoke" instruction. Note that it is force-inlined (by Rust's `Box::new()`), but I don't think that's relevant to this issue (I'll share if I reduce the repro further). When single-stepping in the debugger to the `let mut contents = Box::new(5);` line, the `file` local variable will not show as alive, even though it should. This snippet from ("llc -O0 main.bc -print-after-all") sheds some light: ``` $rcx = COPY %15:gr64, debug-location !42; X:\repos\rust\library\alloc\src\boxed.rs:257 @[ src/main.rs:9 ] $rdx = COPY %15:gr64, debug-location !42; X:\repos\rust\library\alloc\src\boxed.rs:257 @[ src/main.rs:9 ] CALL64pcrel32 @_ZN5alloc5alloc15exchange_malloc17h9b5ed8410e7151ebE, <regmask $bh $bl blah blah...>, implicit $rsp, implicit blah blah, debug-location !42; X:\repos\rust\library\alloc\src\boxed.rs:257 @[ src/main.rs:9 ] ... # End machine code for function _ZN18MatchVariableRepro7get_int17h778e1dfd4fe19ba1E. # *** IR Dump After Fast Register Allocator (regallocfast) ***: # Machine code for function _ZN18MatchVariableRepro7get_int17h778e1dfd4fe19ba1E: NoPHIs, TracksLiveness, NoVRegs, TiedOpsRewritten ... renamable $rdx = MOV32ri64 4 $rcx = COPY renamable $rdx, debug-location !42; X:\repos\rust\library\alloc\src\boxed.rs:257 @[ src/main.rs:9 ] CALL64pcrel32 @_ZN5alloc5alloc15exchange_malloc17h9b5ed8410e7151ebE, <regmask $bh $bl blah blah...>, implicit $rsp, implicit blah blah, debug-location !42; X:\repos\rust\library\alloc\src\boxed.rs:257 @[ src/main.rs:9 ] ``` "renamable $rdx = MOV32ri64 4" is a new instruction introduced by this pass, and it is introduced without debug info. This is... reasonable, I guess, since it's new. It's just loading the constant "4" into RDX (to be used as size and alignment for the allocation, I believe). Technically, this instruction, and the subsequent one, still should belong to the same scope as the two instructions that they replace. And sure enough, they both end up marked as belonging to the same line of code. But later, when the lifetime of the `file` local variable is determined, that generated first instruction is not included the ranges, causing the issue at hand. I'm not sure where the core error lies (I don't know LLVM well enough). It could be in the optimization (the new instruction is generated from actual code, so it should have debug info). Or it could be in the later determination of the lifetime of the local variable, where the list of instruction ranges is determined. I suspect the former is... arguable against because the new instruction is just "loading a constant" that could then be reused in a number of ways. But this is unoptimized code generation, so...
jmorse commented 2 weeks ago

Having debug-info defects when there are no optimisations is particularly poor; would you be able to provide a LLVM-IR file from before the codegen passes begin? I'm unfamiliar with rust, but I imagine you can dump the LLVM-IR it produces, then use llvm-extract -o - -S --func=_ZN18MatchVariableRepro7get_int17h778e1dfd4fe19ba1E to extract the function in question.

As you say, constant-loads often get merged and rematerialised with debug-info being dropped along the way... although that shouldn't be occurring with unoptimised builds.

As you mention Windows, what debugger are you using there? It's possible that there's a difference between *nix/windows debugger behaviours there. From a linux point of view, I would expect a step to land on the copy to $rcx, after which presumably file should be in scope.

TheJCAB commented 2 weeks ago

@jmorse Thank you for this follow-up. I managed to cook a good, very minimal repro:

main.zip

It includes source files and generated files, including a PDB and executable, but you can rebuild from the kk.ll file like so:

llvm-as.exe kk.ll llc.exe -O0 kk.bc llc.exe -O0 kk.bc --filetype=obj

I built the executable using Visual Studio's linker:

link kk.obj /ENTRY:get_int /SUBSYSTEM:console /DEBUG

From the generated files, you can see that this:

    #dbg_declare(ptr %file.dbg.spill, !21, !DIExpression(), !27)
  %_4 = invoke i64 @new(i64 4, i64 4)
          to label %bb2 unwind label %funclet_bb2.i5, !dbg !28

With metadata (local file) declared on line 8, and call to new(4, 4) on line 9:

!21 = !DILocalVariable(name: "file", scope: !22, file: !9, line: 8, type: !16, align: 4)
!22 = distinct !DILexicalBlock(scope: !8, file: !9, line: 8)
...
!27 = !DILocation(line: 8, scope: !22)
!28 = !DILocation(line: 9, scope: !22)

Generates assembly (kk.s) where clearly both move instructions lie on line 9:

    .cv_loc 0 1 9 0                         # main.rs:9:0
.Ltmp0:
    movl    $4, %edx
.Ltmp3:
    movq    %rdx, %rcx
    callq   new

But where the debug info (kk.s, further down):

    .short  .Ltmp20-.Ltmp19                 # Record length
.Ltmp19:
    .short  4414                            # Record kind: S_LOCAL
    .long   117                             # TypeIndex
    .short  0                               # Flags
    .asciz  "file"
    .p2align    2, 0x0
.Ltmp20:
    .cv_def_range    .Ltmp3 .Ltmp4 .Ltmp5 .Ltmp6, frame_ptr_rel, -12
    .short  2                               # Record length
    .short  4431                            # Record kind: S_PROC_ID_END

Shows the local variable starting at the second move instruction (.Ltmp3).

You can also use WinDbg to debug the resulting executable. The ZIP file includes the original Rust source, and the debug info points to it correctly. The function's code has been reduced to almost nothing, so don't expect it to match the source anymore ;-).

As to your question, WinDbg is what I'm using, for various reasons:

Note that, for some reason, building the "Debug" config with CMake on Windows fails to generate debug information for anything. I opened an issue and I'll see if I can submit a fix: #107328

During my debugging, I drilled it down to these two things:

1. A bad scope is generated:

/// initialize - Scan machine function and constuct lexical scope nest.
void LexicalScopes::initialize(const MachineFunction &Fn) {
  reset();
  // Don't attempt any lexical scope creation for a NoDebug compile unit.
  if (Fn.getFunction().getSubprogram()->getUnit()->getEmissionKind() ==
      DICompileUnit::NoDebug)
    return;
  MF = &Fn;
  SmallVector<InsnRange, 4> MIRanges;
  DenseMap<const MachineInstr *, LexicalScope *> MI2ScopeMap;
  extractLexicalScopes(MIRanges, MI2ScopeMap);
  if (CurrentFnLexicalScope) {
    constructScopeNest(CurrentFnLexicalScope);
    assignInstructionRanges(MIRanges, MI2ScopeMap);
  }
}

Somehow, in that function, in MIRanges, the first scope that gets assigned to the file variable ends up not including the load of EDX but starts at the next instruction that loads ECX.

2. The generated "constant load" instruction never gets a debug location.

The load of EDX comes from a constant generated in this function:

SDValue SelectionDAG::getConstant(const ConstantInt &Val, const SDLoc &DL,
                                  EVT VT, bool isT, bool isO) {
  assert(VT.isInteger() && "Cannot create FP integer constant!");

...

  if (!N) {
    N = newSDNode<ConstantSDNode>(isT, isO, Elt, VTs);
    CSEMap.InsertNode(N, IP);
    InsertNode(N);
    NewSDValueDbgMsg(SDValue(N, 0), "Creating constant: ", this);
  }

  SDValue Result(N, 0);
  if (VT.isVector())
    Result = getSplat(VT, DL, Result);
  return Result;
}

That causes the instruction to have no debugging information. I think that may be a bug. The fix was simple:

    N = newSDNode<ConstantSDNode>(isT, isO, Elt, VTs);
+    N->setDebugLoc(DL.getDebugLoc());
    CSEMap.InsertNode(N, IP);

I can see the debug information appear in the debug log after the fix, like so:

  MOV32mi %stack.1.x.dbg.spill.i, 1, $noreg, 0, $noreg, 42 :: (store (s32) into %ir.x.dbg.spill.i)
  EH_LABEL <mcsymbol .Ltmp0>, debug-location !47; X:\repos\rust\library\alloc\src\boxed.rs:257 @[ src/main.rs:13 ]
  ADJCALLSTACKDOWN64 32, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp, debug-location !47; X:\repos\rust\library\alloc\src\boxed.rs:257 @[ src/main.rs:13 ]
-  renamable $rdx = MOV32ri64 4
+  renamable $rdx = MOV32ri64 4, debug-location !47; X:\repos\rust\library\alloc\src\boxed.rs:257 @[ src/main.rs:13 ]
  $rcx = COPY renamable $rdx, debug-location !47; X:\repos\rust\library\alloc\src\boxed.rs:257 @[ src/main.rs:13 ]
  CALL64pcrel32 @_ZN5alloc5alloc15exchange_malloc17h9b5ed8410e7151ebE, <regmask $bh $bl $bp $bph $bpl $bx $di $dih $dil $ebp $ebx $edi $esi $hbp $hbx $hdi $hsi $rbp $rbx $rdi $rsi $si $sih $sil $r12 $r13 $r14 $r15 $xmm6 $xmm7 $xmm8 $xmm9 $xmm10 and 25 more...>, implicit $rsp, implicit $ssp, implicit killed $rcx, implicit killed $rdx, implicit-def $rsp, implicit-def $ssp, implicit-def $rax, debug-location !47; X:\repos\rust\library\alloc\src\boxed.rs:257 @[ src/main.rs:13 ]
  ADJCALLSTACKUP64 32, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp, debug-location !47; X:\repos\rust\library\alloc\src\boxed.rs:257 @[ src/main.rs:13 ]

I'm tempted to report that as an issue, however that ultimately doesn't seem to fix the problem so I'm not sure. I'd love to hear opinions from people with more experience with this codebase.

Unfortunately, all the debug logging that I managed to get out of llc says nothing about how the CodeView debug information is generated, so I'm left with raw debugging LLVM's quite opaque data structures. The NatVis file included with LLVM helps but is woefully incomplete (and WinDbg's implementation has a few bugs, which I'm trying to also get fixed -- synergy at work :-)). I'll try and contribute my enhancements to the NatVis. Hopefully I can get a win+win+win+win+win out of all this :-).