Open llvmbot opened 6 years ago
If your tool wants to have a source location for llvm.dbg.value intrinsics, you can modify mem2reg to stick the DILocation of the store onto the dbg.value. That information won't be used by any consumer inside of inside of LLVM (with the ecexption of the inlinedAt: field) so that should be a safe thing to do.
Can you explain what your tool would do with the DILocation of a dbg.value?
We use debug info to produce visual feed back (in the form of a html file) of the found error.
We have modified mem2reg already and somewhat it works for us. However, we are not sure our modification are sound.
Yes, what I mean is, the DILocation on the dbg.value is used for its inlinedAt value today because we have no other way to recover this information.
What's "this" information that you are referring to here?
The inlinedAt value.
If we retained the dbg.declare call or something equivalent to it, then we could use the DILocation of the dbg.value for something more like this.
Why would retaining the dbg.declare make any difference? The dbg.declare and the dbg.value carry the exact same amount of information about the variable, don't they?
I'm suggesting that the current choice to store the DILocation of the variable declaration in the source location slot of the dbg.value isn't very clean. We end up having to discover concrete variables in the backend by doing lookup on pairs of {DIVariable, inlinedAt}, and this information is redundantly encoded on every variable assignment. Furthermore, if the dbg.value instructions get removed, we forget the variable ever existed. Consider something like:
void f() { int foo_data; if (FOO_ENABLED) { // -DFOO_ENABLED=0 foo_data = work(); } stop_here(); if (FOO_ENABLED) { // -DFOO_ENABLED=0 use_data(foo_data); } }
If we had an instruction that produced concrete variables like alloca instructions produce memory, we wouldn't need to repurpose the source location of dbg.value to hold the variable inlinedAt info. We could instead refer to that concrete variable as the second argument of dbg.value.
Here's a use case where having the original store location might help us:
void foo(); void bar(); int f(int cond) { int x; if (cond) { foo(); x = 42; } else { x = 1; } return x; }
The annotated assembly:
testl %ecx, %ecx
.Ltmp0: .loc 1 5 7 is_stmt 0 # t.cpp:5:7 je .LBB0_1 .Ltmp1:
#DEBUG_VALUE: f:cond <- %ecx
.loc 1 6 5 is_stmt 1 # t.cpp:6:5
callq "?foo@@YAXXZ"
.Ltmp2:
.loc 1 0 5 is_stmt 0 # t.cpp:0:5
movl $42, %eax
jmp .LBB0_3
.Ltmp3: .LBB0_1:
movl $1, %eax
.Ltmp4: .LBB0_3: # %if.end
.loc 1 11 3 is_stmt 1 # t.cpp:11:3
The instructions materializing 42 and 1 should ideally have the source location of the original assignments, but our current representation cannot encode that. If we could use the dbg.value sloc for the location of the store, we could emit that line table entry when materializing constants for phi nodes at the end of the basic block. Of course, it you reorder the assignment and the function call, it's not so obvious if this really matters. =S
Yes, what I mean is, the DILocation on the dbg.value is used for its inlinedAt value today because we have no other way to recover this information.
What's "this" information that you are referring to here?
The DILocalVariable is not enough in the presence of multiple inlined copies. That is correct.
If we retained the dbg.declare call or something equivalent to it, then we could use the DILocation of the dbg.value for something more like this.
Why would retaining the dbg.declare make any difference? The dbg.declare and the dbg.value carry the exact same amount of information about the variable, don't they?
I can see why we do things this way today, though. After we remove the alloca and dbg.declare, we have no record of the concrete variable, just the abstract one, and now way to differentiate the assignment from multiple inlined copies of the same variable.
I don't think that's true. The purpose of the DILocation attachment on DbgValueInsts is for the DILocation's inlinedAt: field so inlined instances can be distinguished. Let me know if we are talking past another :-)
Yes, what I mean is, the DILocation on the dbg.value is used for its inlinedAt value today because we have no other way to recover this information. The DILocalVariable is not enough in the presence of multiple inlined copies.
If we retained the dbg.declare call or something equivalent to it, then we could use the DILocation of the dbg.value for something more like this.
It may be the intended design, but I am not sure how a downstream tool should process this situation. I am building a model checker on top of LLVM IR. If my tool finds a bug, it is having trouble in producing accurate witness and not able to point at the code correctly. Do you have any suggestion to recover the lost info?
If your tool wants to have a source location for llvm.dbg.value intrinsics, you can modify mem2reg to stick the DILocation of the store onto the dbg.value. That information won't be used by any consumer inside of inside of LLVM (with the ecexption of the inlinedAt: field) so that should be a safe thing to do.
Can you explain what your tool would do with the DILocation of a dbg.value?
I can see why we do things this way today, though. After we remove the alloca and dbg.declare, we have no record of the concrete variable, just the abstract one, and now way to differentiate the assignment from multiple inlined copies of the same variable.
I don't think that's true. The purpose of the DILocation attachment on DbgValueInsts is for the DILocation's inlinedAt: field so inlined instances can be distinguished. Let me know if we are talking past another :-)
This seems like a weakness in the design we might want to fix at some point. I tried this example and did not get what I expected:
static inline void inc(int p) { ++p; } int foo() { int i = 0; inc(&i); return i; }
I get this IR:
define i32 @"\01?foo@@YAHXZ"() local_unnamed_addr #0 !dbg !8 { entry: call void @llvm.dbg.value(metadata i32 0, metadata !13, metadata !DIExpression()), !dbg !14 call void @llvm.dbg.value(metadata i32 1, metadata !13, metadata !DIExpression()), !dbg !14 ret i32 1, !dbg !15 }
!14 points to the declaration of i, so the record of 'inc' being inlined is lost.
I can see why we do things this way today, though. After we remove the alloca and dbg.declare, we have no record of the concrete variable, just the abstract one, and now way to differentiate the assignment from multiple inlined copies of the same variable.
Maybe we should leave behind the dbg.declare intrinsics (or something replacing them) that records the concrete variable declaration.
After the transformation by looking at the stream of instructions, I know that i is being assigned 2, but the location of the assignment has been lost by transformation.
It may be the intended design, but I am not sure how a downstream tool should process this situation. I am building a model checker on top of LLVM IR. If my tool finds a bug, it is having trouble in producing accurate witness and not able to point at the code correctly. Do you have any suggestion to recover the lost info?
Please note that both store calls are replaced by dbg.value calls. Both of them have debug value !14 that points at the declaration of the variable i. There is no more debug info that points to the two assignments.
IMHO, this is bug in Mem2Reg. I think this is stemming from the functions like the following function at Local.cpp:1104 (version 5.0.1)
No, this is the expected behavior. The DILocation attached to llvm.dbg.value intrinsics is only used to capture inlinedAt: information for the local variable the source location within in the function always points to the declaration of the source variable and is not used to emit debug information at all.
The only thing relevant for a dbg.value is its relative position in the instruction stream relative to other instructions (which may have DILocations that actually sho up in the line table.
Can you elaborate why you think that this behavior is a problem?
@llvm/issue-subscribers-debuginfo
Author: None (llvmbot)
Extended Description
Please consider the following example:
int main() { int i;
i=0;
i=2;
}
If we emit IR of the above function via clang. We obtain the following:
define i32 @main() #0 !dbg !7 { entry: %i = alloca i32, align 4 call void @llvm.dbg.declare(metadata i32 %i, metadata !12, metadata !13), !dbg !14 store i32 0, i32 %i, align 4, !dbg !15 store i32 2, i32* %i, align 4, !dbg !16 ret i32 0, !dbg !17 }
!12 = !DILocalVariable(name: "i", scope: !7, file: !8, line: 4, type: !11) !13 = !DIExpression() !14 = !DILocation(line: 4, column: 7, scope: !7) !15 = !DILocation(line: 6, column: 4, scope: !7) !16 = !DILocation(line: 8, column: 4, scope: !7) !17 = !DILocation(line: 11, column: 1, scope: !7)
Please note that both the stores have debug information !15 and !16 that correctly identifies the correct location of the writes on i.
If we pass the above function via Mem2Reg pass we obtain the following IR:
; Function Attrs: noinline nounwind define i32 @main() #0 !dbg !7 { call void @llvm.dbg.value(metadata i32 0, i64 0, metadata !12, metadata !13), !dbg !14 call void @llvm.dbg.value(metadata i32 2, i64 0, metadata !12, metadata !13), !dbg !14 ret i32 0, !dbg !15 }
!12 = !DILocalVariable(name: "i", scope: !7, file: !8, line: 4, type: !11) !13 = !DIExpression() !14 = !DILocation(line: 4, column: 7, scope: !7) !15 = !DILocation(line: 11, column: 1, scope: !7)
Please note that both store calls are replaced by dbg.value calls. Both of them have debug value !14 that points at the declaration of the variable i. There is no more debug info that points to the two assignments.
IMHO, this is bug in Mem2Reg. I think this is stemming from the functions like the following function at Local.cpp:1104 (version 5.0.1)
void llvm::ConvertDebugDeclareToDebugValue(DbgDeclareInst DDI, StoreInst SI, DIBuilder &Builder) { .... .... .... .... .... .... .... .... .... 1139: Builder.insertDbgValueIntrinsic(DV, 0, DIVar, DIExpr, DDI->getDebugLoc(), SI); }
Please note that the new debugValue instruction at line 1139 is not taking debug info from SI but it is getting DDI->getDebugLoc(). Therefore, SI->getDebugLoc() is being lost.
I believe the debug information of declaration is already preserved in DIVar. So there is no need to another copy on the new debugValue instruction. Therefore, the fifth parameter should be SI->getDebugLoc().