llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.15k stars 11.62k forks source link

Mem2Reg removes debug info from IR generated from CLANG #35582

Open llvmbot opened 6 years ago

llvmbot commented 6 years ago
Bugzilla Link 36234
Version 5.0
OS Linux
Reporter LLVM Bugzilla Contributor
CC @adrian-prantl,@dwblaikie,@rnk

Extended Description

Please consider the following example:

int main() { int i;

i=0;

i=2;

}

If we emit IR of the above function via clang. We obtain the following:

define i32 @​main() #​0 !dbg !​7 { entry: %i = alloca i32, align 4 call void @​llvm.dbg.declare(metadata i32 %i, metadata !​12, metadata !​13), !dbg !​14 store i32 0, i32 %i, align 4, !dbg !​15 store i32 2, i32* %i, align 4, !dbg !​16 ret i32 0, !dbg !​17 }

!​12 = !DILocalVariable(name: "i", scope: !​7, file: !​8, line: 4, type: !​11) !​13 = !DIExpression() !​14 = !DILocation(line: 4, column: 7, scope: !​7) !​15 = !DILocation(line: 6, column: 4, scope: !​7) !​16 = !DILocation(line: 8, column: 4, scope: !​7) !​17 = !DILocation(line: 11, column: 1, scope: !​7)

Please note that both the stores have debug information !​15 and !​16 that correctly identifies the correct location of the writes on i.

If we pass the above function via Mem2Reg pass we obtain the following IR:

; Function Attrs: noinline nounwind define i32 @​main() #​0 !dbg !​7 { call void @​llvm.dbg.value(metadata i32 0, i64 0, metadata !​12, metadata !​13), !dbg !​14 call void @​llvm.dbg.value(metadata i32 2, i64 0, metadata !​12, metadata !​13), !dbg !​14 ret i32 0, !dbg !​15 }

!​12 = !DILocalVariable(name: "i", scope: !​7, file: !​8, line: 4, type: !​11) !​13 = !DIExpression() !​14 = !DILocation(line: 4, column: 7, scope: !​7) !​15 = !DILocation(line: 11, column: 1, scope: !​7)

Please note that both store calls are replaced by dbg.value calls. Both of them have debug value !​14 that points at the declaration of the variable i. There is no more debug info that points to the two assignments.

IMHO, this is bug in Mem2Reg. I think this is stemming from the functions like the following function at Local.cpp:1104 (version 5.0.1)

void llvm::ConvertDebugDeclareToDebugValue(DbgDeclareInst DDI, StoreInst SI, DIBuilder &Builder) { .... .... .... .... .... .... .... .... .... 1139: Builder.insertDbgValueIntrinsic(DV, 0, DIVar, DIExpr, DDI->getDebugLoc(), SI); }

Please note that the new debugValue instruction at line 1139 is not taking debug info from SI but it is getting DDI->getDebugLoc(). Therefore, SI->getDebugLoc() is being lost.

I believe the debug information of declaration is already preserved in DIVar. So there is no need to another copy on the new debugValue instruction. Therefore, the fifth parameter should be SI->getDebugLoc().

llvmbot commented 6 years ago

If your tool wants to have a source location for llvm.dbg.value intrinsics, you can modify mem2reg to stick the DILocation of the store onto the dbg.value. That information won't be used by any consumer inside of inside of LLVM (with the ecexption of the inlinedAt: field) so that should be a safe thing to do.

Can you explain what your tool would do with the DILocation of a dbg.value?

We use debug info to produce visual feed back (in the form of a html file) of the found error.

We have modified mem2reg already and somewhat it works for us. However, we are not sure our modification are sound.

rnk commented 6 years ago

Yes, what I mean is, the DILocation on the dbg.value is used for its inlinedAt value today because we have no other way to recover this information.

What's "this" information that you are referring to here?

The inlinedAt value.

If we retained the dbg.declare call or something equivalent to it, then we could use the DILocation of the dbg.value for something more like this.

Why would retaining the dbg.declare make any difference? The dbg.declare and the dbg.value carry the exact same amount of information about the variable, don't they?

I'm suggesting that the current choice to store the DILocation of the variable declaration in the source location slot of the dbg.value isn't very clean. We end up having to discover concrete variables in the backend by doing lookup on pairs of {DIVariable, inlinedAt}, and this information is redundantly encoded on every variable assignment. Furthermore, if the dbg.value instructions get removed, we forget the variable ever existed. Consider something like:

void f() { int foo_data; if (FOO_ENABLED) { // -DFOO_ENABLED=0 foo_data = work(); } stop_here(); if (FOO_ENABLED) { // -DFOO_ENABLED=0 use_data(foo_data); } }

If we had an instruction that produced concrete variables like alloca instructions produce memory, we wouldn't need to repurpose the source location of dbg.value to hold the variable inlinedAt info. We could instead refer to that concrete variable as the second argument of dbg.value.

Here's a use case where having the original store location might help us:

void foo(); void bar(); int f(int cond) { int x; if (cond) { foo(); x = 42; } else { x = 1; } return x; }

The annotated assembly:

DEBUG_VALUE: f:cond <- %ecx

    testl   %ecx, %ecx

.Ltmp0: .loc 1 5 7 is_stmt 0 # t.cpp:5:7 je .LBB0_1 .Ltmp1:

%bb.2: # %if.then

    #DEBUG_VALUE: f:cond <- %ecx
    .loc    1 6 5 is_stmt 1         # t.cpp:6:5
    callq   "?foo@@YAXXZ"

.Ltmp2:

DEBUG_VALUE: f:x <- 42

    .loc    1 0 5 is_stmt 0         # t.cpp:0:5
    movl    $42, %eax
    jmp     .LBB0_3

.Ltmp3: .LBB0_1:

DEBUG_VALUE: f:cond <- %ecx

    movl    $1, %eax

.Ltmp4: .LBB0_3: # %if.end

DEBUG_VALUE: f:x <- %eax

    .loc    1 11 3 is_stmt 1        # t.cpp:11:3

The instructions materializing 42 and 1 should ideally have the source location of the original assignments, but our current representation cannot encode that. If we could use the dbg.value sloc for the location of the store, we could emit that line table entry when materializing constants for phi nodes at the end of the basic block. Of course, it you reorder the assignment and the function call, it's not so obvious if this really matters. =S

adrian-prantl commented 6 years ago

Yes, what I mean is, the DILocation on the dbg.value is used for its inlinedAt value today because we have no other way to recover this information.

What's "this" information that you are referring to here?

The DILocalVariable is not enough in the presence of multiple inlined copies. That is correct.

If we retained the dbg.declare call or something equivalent to it, then we could use the DILocation of the dbg.value for something more like this.

Why would retaining the dbg.declare make any difference? The dbg.declare and the dbg.value carry the exact same amount of information about the variable, don't they?

rnk commented 6 years ago

I can see why we do things this way today, though. After we remove the alloca and dbg.declare, we have no record of the concrete variable, just the abstract one, and now way to differentiate the assignment from multiple inlined copies of the same variable.

I don't think that's true. The purpose of the DILocation attachment on DbgValueInsts is for the DILocation's inlinedAt: field so inlined instances can be distinguished. Let me know if we are talking past another :-)

Yes, what I mean is, the DILocation on the dbg.value is used for its inlinedAt value today because we have no other way to recover this information. The DILocalVariable is not enough in the presence of multiple inlined copies.

If we retained the dbg.declare call or something equivalent to it, then we could use the DILocation of the dbg.value for something more like this.

adrian-prantl commented 6 years ago

It may be the intended design, but I am not sure how a downstream tool should process this situation. I am building a model checker on top of LLVM IR. If my tool finds a bug, it is having trouble in producing accurate witness and not able to point at the code correctly. Do you have any suggestion to recover the lost info?

If your tool wants to have a source location for llvm.dbg.value intrinsics, you can modify mem2reg to stick the DILocation of the store onto the dbg.value. That information won't be used by any consumer inside of inside of LLVM (with the ecexption of the inlinedAt: field) so that should be a safe thing to do.

Can you explain what your tool would do with the DILocation of a dbg.value?

adrian-prantl commented 6 years ago

I can see why we do things this way today, though. After we remove the alloca and dbg.declare, we have no record of the concrete variable, just the abstract one, and now way to differentiate the assignment from multiple inlined copies of the same variable.

I don't think that's true. The purpose of the DILocation attachment on DbgValueInsts is for the DILocation's inlinedAt: field so inlined instances can be distinguished. Let me know if we are talking past another :-)

rnk commented 6 years ago

This seems like a weakness in the design we might want to fix at some point. I tried this example and did not get what I expected:

static inline void inc(int p) { ++p; } int foo() { int i = 0; inc(&i); return i; }

I get this IR:

define i32 @"\01?foo@@YAHXZ"() local_unnamed_addr #​0 !dbg !​8 { entry: call void @​llvm.dbg.value(metadata i32 0, metadata !​13, metadata !DIExpression()), !dbg !​14 call void @​llvm.dbg.value(metadata i32 1, metadata !​13, metadata !DIExpression()), !dbg !​14 ret i32 1, !dbg !​15 }

!​14 points to the declaration of i, so the record of 'inc' being inlined is lost.

I can see why we do things this way today, though. After we remove the alloca and dbg.declare, we have no record of the concrete variable, just the abstract one, and now way to differentiate the assignment from multiple inlined copies of the same variable.

Maybe we should leave behind the dbg.declare intrinsics (or something replacing them) that records the concrete variable declaration.

llvmbot commented 6 years ago

After the transformation by looking at the stream of instructions, I know that i is being assigned 2, but the location of the assignment has been lost by transformation.

It may be the intended design, but I am not sure how a downstream tool should process this situation. I am building a model checker on top of LLVM IR. If my tool finds a bug, it is having trouble in producing accurate witness and not able to point at the code correctly. Do you have any suggestion to recover the lost info?

adrian-prantl commented 6 years ago

Please note that both store calls are replaced by dbg.value calls. Both of them have debug value !​14 that points at the declaration of the variable i. There is no more debug info that points to the two assignments.

IMHO, this is bug in Mem2Reg. I think this is stemming from the functions like the following function at Local.cpp:1104 (version 5.0.1)

No, this is the expected behavior. The DILocation attached to llvm.dbg.value intrinsics is only used to capture inlinedAt: information for the local variable the source location within in the function always points to the declaration of the source variable and is not used to emit debug information at all.

The only thing relevant for a dbg.value is its relative position in the instruction stream relative to other instructions (which may have DILocations that actually sho up in the line table.

Can you elaborate why you think that this behavior is a problem?

llvmbot commented 2 months ago

@llvm/issue-subscribers-debuginfo

Author: None (llvmbot)

| | | | --- | --- | | Bugzilla Link | [36234](https://llvm.org/bz36234) | | Version | 5.0 | | OS | Linux | | Reporter | LLVM Bugzilla Contributor | | CC | @adrian-prantl,@dwblaikie,@rnk | ## Extended Description Please consider the following example: int main() { int i; i=0; i=2; } If we emit IR of the above function via clang. We obtain the following: define i32 @&#8203;main() #&#8203;0 !dbg !&#8203;7 { entry: %i = alloca i32, align 4 call void @&#8203;llvm.dbg.declare(metadata i32* %i, metadata !&#8203;12, metadata !&#8203;13), !dbg !&#8203;14 store i32 0, i32* %i, align 4, !dbg !&#8203;15 store i32 2, i32* %i, align 4, !dbg !&#8203;16 ret i32 0, !dbg !&#8203;17 } !&#8203;12 = !DILocalVariable(name: "i", scope: !&#8203;7, file: !&#8203;8, line: 4, type: !&#8203;11) !&#8203;13 = !DIExpression() !&#8203;14 = !DILocation(line: 4, column: 7, scope: !&#8203;7) !&#8203;15 = !DILocation(line: 6, column: 4, scope: !&#8203;7) !&#8203;16 = !DILocation(line: 8, column: 4, scope: !&#8203;7) !&#8203;17 = !DILocation(line: 11, column: 1, scope: !&#8203;7) Please note that both the stores have debug information !&#8203;15 and !&#8203;16 that correctly identifies the correct location of the writes on i. If we pass the above function via Mem2Reg pass we obtain the following IR: ; Function Attrs: noinline nounwind define i32 @&#8203;main() #&#8203;0 !dbg !&#8203;7 { call void @&#8203;llvm.dbg.value(metadata i32 0, i64 0, metadata !&#8203;12, metadata !&#8203;13), !dbg !&#8203;14 call void @&#8203;llvm.dbg.value(metadata i32 2, i64 0, metadata !&#8203;12, metadata !&#8203;13), !dbg !&#8203;14 ret i32 0, !dbg !&#8203;15 } !&#8203;12 = !DILocalVariable(name: "i", scope: !&#8203;7, file: !&#8203;8, line: 4, type: !&#8203;11) !&#8203;13 = !DIExpression() !&#8203;14 = !DILocation(line: 4, column: 7, scope: !&#8203;7) !&#8203;15 = !DILocation(line: 11, column: 1, scope: !&#8203;7) Please note that both store calls are replaced by dbg.value calls. Both of them have debug value !&#8203;14 that points at the declaration of the variable i. There is no more debug info that points to the two assignments. IMHO, this is bug in Mem2Reg. I think this is stemming from the functions like the following function at Local.cpp:1104 (version 5.0.1) void llvm::ConvertDebugDeclareToDebugValue(DbgDeclareInst *DDI, StoreInst *SI, DIBuilder &Builder) { .... .... .... .... .... .... .... .... .... 1139: Builder.insertDbgValueIntrinsic(DV, 0, DIVar, DIExpr, DDI->getDebugLoc(), SI); } Please note that the new debugValue instruction at line 1139 is not taking debug info from SI but it is getting DDI->getDebugLoc(). Therefore, SI->getDebugLoc() is being lost. I believe the debug information of declaration is already preserved in DIVar. So there is no need to another copy on the new debugValue instruction. Therefore, the fifth parameter should be SI->getDebugLoc().