llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.43k stars 11.75k forks source link

Wrong debug info generated at -O3 (-O0 is correct) #44498

Open llvmbot opened 4 years ago

llvmbot commented 4 years ago
Bugzilla Link 45153
Version trunk
OS Linux
Blocks llvm/llvm-project#38116
Reporter LLVM Bugzilla Contributor
CC @adrian-prantl,@dwblaikie,@JDevlieghere,@jmorse,@jdm,@walkerkd,@pogo59,@rnk

Extended Description

The expected output from lldb should be 20, but lldb outputs 0 when the program is compiled with "-O3". It behaves correctly when the program is compiled with "-O0".

In both cases, the debugger stops at line 7 and the tiny underscore is between c and 0. I think this indicates that the debugger stops at the same breakpoint, and in this case it should at least prints the same value. My understanding is that the assignment operation of l_16 hasn't been executed, thus the correct output should be 20.

$ cat abc.c int a, b; unsigned c; int main() { int l_16 = 0; for (; l_16 < 20; l_16 = l_16 + 5) ; l_16 = c = 0 // optimize_me_not0 ^ a; b = b & 1048575 ^ a; }

$ cat cmds b 7 r p l_16 kill q

$ clang-trunk -v clang version 11.0.0 (https://github.com/llvm/llvm-project.git bdcbdb484829c518511eece0809cc8ce1baa73c8) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /usr/local/bin Found candidate GCC installation: /usr/lib/gcc/i686-linux-gnu/8 Found candidate GCC installation: /usr/lib/gcc/i686-linux-gnu/9 Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.4 Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.4.7 Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.6 Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.6.4 Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8 Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/4.8.5 Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/5 Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/5.5.0 Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/6 Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/6.5.0 Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7 Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7.4.0 Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/8 Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/9 Selected GCC installation: /usr/lib/gcc/x86_64-linux-gnu/9 Candidate multilib: .;@m64 Candidate multilib: 32;@m32 Candidate multilib: x32;@mx32 Selected multilib: .;@m64

$ lldb-trunk -v lldb version 11.0.0 clang revision bdcbdb484829c518511eece0809cc8ce1baa73c8 llvm revision bdcbdb484829c518511eece0809cc8ce1baa73c8

$ clang-trunk -g abc.c -O3 $ lldb-trunk -s cmds -b a.out (lldb) target create "a.out" Current executable set to '/home/sding/LLDB-testing/reduce/24200/report/a.out' (x86_64). (lldb) command source -s 0 'cmds' Executing commands in '/home/sding/LLDB-testing/reduce/24200/report/cmds'. (lldb) b 7 Breakpoint 1: where = a.out`main + 17 at abc.c:7:12, address = 0x0000000000400491 (lldb) r Process 5369 stopped

Process 5369 launched: '/home/sding/LLDB-testing/reduce/24200/report/a.out' (x86_64) (lldb) p l_16 (int) $0 = 0 (lldb) kill Process 5369 exited with status = 9 (0x00000009) (lldb) q

$ clang-trunk -g abc.c -O0 $ lldb-trunk -s cmds -b a.out (lldb) target create "a.out" Current executable set to '/home/sding/LLDB-testing/reduce/24200/report/a.out' (x86_64). (lldb) command source -s 0 'cmds' Executing commands in '/home/sding/LLDB-testing/reduce/24200/report/cmds'. (lldb) b 7 Breakpoint 1: where = a.out`main + 57 at abc.c:7:12, address = 0x00000000004004b9 (lldb) r Process 6210 stopped

Process 6210 launched: '/home/sding/LLDB-testing/reduce/24200/report/a.out' (x86_64) (lldb) p l_16 (int) $0 = 20 (lldb) kill Process 6210 exited with status = 9 (0x00000009) (lldb) q

pogo59 commented 4 years ago

True, that'd be better -- as far as I'm aware, in LLVM is_stmt isn't used to communicate anything interesting to the user, which we can improve.

LLVM doesn't think about is_stmt until it's time to emit .loc directives to the assembler. The heuristic was designed for -O0 and doesn't really work well in the presence of optimizations. Some sort of IR-level tracking would be distinctly better, but it would also be a bit disruptive. Maybe keeping an is_stmt flag in DebugLoc would be the least so, but I haven't thought about it a whole lot.

jmorse commented 4 years ago

Keith wrote:

For your last comment about deleting lines entries; I would have thought the better thing to do would be to use the DWARF is_stmt flag on the line table to indicate where (and where not) to place source level breakpoints, so that the instructions remain associated with the appropriate source lines, but source level stepping only stops at the places where you want.

True, that'd be better -- as far as I'm aware, in LLVM is_stmt isn't used to communicate anything interesting to the user, which we can improve. Reid expressed an interest in fixing this at the conference.

This reproducer would be a great test case for is_stmt, because the source level statement becomes two instructions that then get rescheduled, which I imagine is challenging to deal with. Right now the step onto line 7 is marked "is_stmt" already, I guess a better implementation would be able to determine that it shouldn't be an is_stmt line?

jmorse commented 4 years ago

David wrote:

Hey, what'd you know, we get my example wrong too - in a maybe slightly different way:

I think this example suffers from bug 34136, I get the following immediately after SROA and before early-cse:

define dso_local i32 @​main() #​3 !dbg !​24 { entry: %i = alloca i32, align 4 %0 = bitcast i32 %i to i8, !dbg !​27 call void @​llvm.lifetime.start.p0i8(i64 4, i8 %0) #​5, !dbg !​27 call void @​llvm.dbg.declare(metadata i32 %i, metadata !​26, metadata !DIExpression()), !dbg !​28 store i32 3, i32 %i, align 4, !dbg !​28, !tbaa !​29 %call = call i32 @​f1() #​6, !dbg !​31 store i32 %call, i32 %i, align 4, !dbg !​32, !tbaa !​29 store i32 7, i32 %i, align 4, !dbg !​33, !tbaa !​29 call void @​f2(i32 %i), !dbg !​34 %1 = bitcast i32 %i to i8, !dbg !​35 call void @​llvm.lifetime.end.p0i8(i64 4, i8* %1) #​5, !dbg !​35 ret i32 0, !dbg !​35 }

The store of %call to %i gets eliminated as a dead store; and as described in bug 34136, there's no way of describing an assignment that no longer exists in our current model, at this stage of compilation. This causes the earlier variable locations to "leak" further down.

For some reason we emit a constant 7 /before/ the store, which is unfortunate/incorrect. (though maybe useful in other cases where the store is sunk down to closer to its use? I guess that's what it's for & it's degenerate in this case where there are no instructions to sink it past)

For the record, that variable location is above the store from the moment it's created. Immediately after instcombine converts the dbg.declare above into dbg.values, we get:

define dso_local i32 @​main() local_unnamed_addr #​3 !dbg !​24 { entry: %i = alloca i32, align 4 %0 = bitcast i32 %i to i8, !dbg !​27 call void @​llvm.lifetime.start.p0i8(i64 4, i8 nonnull %0) #​5, !dbg !​27 call void @​llvm.dbg.value(metadata i32 3, metadata !​26, metadata !DIExpression()), !dbg !​28 call void @​llvm.dbg.value(metadata i32 7, metadata !​26, metadata !DIExpression()), !dbg !​28 store i32 7, i32 %i, align 4, !dbg !​29, !tbaa !​30 call void @​llvm.dbg.value(metadata i32 %i, metadata !​26, metadata !DIExpression(DW_OP_deref)), !dbg !​28 call void @​f2(i32 nonnull %i), !dbg !​32 call void @​llvm.lifetime.end.p0i8(i64 4, i8* nonnull %0) #​5, !dbg !​33 ret i32 0, !dbg !​33 }

Perhaps this is an edge case where instcombine expects a load / store to later be eliminated? It appears to be a deliberate choice of ConvertDebugDeclareToDebugValue for stores, to place the dbg.value ahead of the store. I imagine this does have the potential to mislead, if someone steps onto the store and expects the variable to contain the value from before the store.

So, yeah, not sure why it's prematurely using a constant location, instead of actually using the register location for the whole duration there.

I think the way LowerDbgDeclare works is to describe the constant/register locations as much as possible, on the assumption that stores are likely to be eliminated. It deliberately describes the stack location ahead of the function call because the call is likely to write to the stack location. I don't think this is guaranteed to always be 100% correct, probably due to bug 34136.

walkerkd commented 4 years ago

For your last comment about deleting lines entries; I would have thought the better thing to do would be to use the DWARF is_stmt flag on the line table to indicate where (and where not) to place source level breakpoints, so that the instructions remain associated with the appropriate source lines, but source level stepping only stops at the places where you want.

jmorse commented 4 years ago

There are a few awkward behaviours with this test -- firstly, my understanding of C is that for the statement "l_16 = c = 0 ^ a;" there's no principle or rule about which assignment happens first, only that they both complete before the sequence point ';'. Ideally, LLVM would connect the assignments with the line numbers so that we could make an effort to present both assignments as completing in one statement; however there just isn't the infrastructure for that today. See also bug 43970, where these two things don't connect.

On the topic of what range l_16 should have, I agree with Davide / David that it's better to mark it optimised out. IMHO, we should try and avoid presenting states of the program that are not present at -O0, and mark things as optimised out if we would end up presenting a stale value.

Excitingly, that is what's happening with variable locations, l_16 has no location for the first few instructions. However, machine-scheduler then moves an instruction from before the assignment to after it, introducing a backwards step into the program. I get the following assembly:

<+0>:     mov    0x200bae(%rip),%eax        # 0x601034 <a>   L3
<+6>:     mov    $0xfffff,%ecx                               L8
<+11>:    and    0x200b9b(%rip),%ecx        # 0x60102c <b>   L9
<+17>:    mov    %eax,0x200b99(%rip)        # 0x601030 <c>   L7
<+23>:    xor    %eax,%ecx                                   L9
<+25>:    mov    %ecx,0x200b8d(%rip)        # 0x60102c <b>   L9
<+31>:    xor    %eax,%eax                                   L10
<+33>:    retq                                               L10

Where I've put 'L\d+' indicating the line number on the end. The store to 'c' gets moved below the 'and' instruction by machine-scheduler, placing a pre-assignment instruction below the position that the variable location information thinks the assignment occurs. As I pointed out in the recent llvm-dev RFC, there's no real attempt in the instruction scheduling passes to prevent this kind of thing happening, and it's really difficult to get right currently. bug 43955 and bug 43949 present similar situations, although they move DBG_VALUEs around, instead of line numbers.

Exactly what we could do in this scenario is arguable: we could extend the optimised-out-ness of l_16 even further; we could also delete the line numbers of the 'and' instruction and mov-to-ecx instruction, making line 7 appear early in the function and not have completed its assignment yet. I suspect there's no always-right way of doing this.

dwblaikie commented 4 years ago

Hey, what'd you know, we get my example wrong too - in a maybe slightly different way:

attribute((optnone)) attribute((pure)) int f1() { return 3; } attribute((optnone)) void f2(int* x) { } int main() { int i = 3; i = f1(); i = 7; f2(&i); }

Ltmp3:

DEBUG_VALUE: main:i <- 7

    .loc    1 8 5 prologue_end      # loc.c:8:5
    movl    $7, 4(%rsp)

.Ltmp4:

DEBUG_VALUE: main:i <- [DW_OP_plus_uconst 4, DW_OP_deref] $rsp

    .loc    1 0 5 is_stmt 0         # loc.c:0:5
    leaq    4(%rsp), %rdi
    .loc    1 9 3 is_stmt 1         # loc.c:9:3
    callq   .Lf2$local

For some reason we emit a constant 7 /before/ the store, which is unfortunate/incorrect. (though maybe useful in other cases where the store is sunk down to closer to its use? I guess that's what it's for & it's degenerate in this case where there are no instructions to sink it past)

Sort of correct (apologies if I'm derailing this bug - let me know, can write this stuff somewhere else):

Modify the code to:

i = 7; f2(0); f2(&i);

And it turns out the store is not sunk past the first call to f2, but the constant location is used right up until the setup for the second call to f2:

.Ltmp3:

DEBUG_VALUE: main:i <- 7

    .loc    1 8 5 prologue_end      # loc.c:8:5
    movl    $7, 4(%rsp)
    .loc    1 9 3                   # loc.c:9:3
    xorl    %edi, %edi
    callq   .Lf2$local

.Ltmp4:

DEBUG_VALUE: main:i <- [DW_OP_plus_uconst 4, DW_OP_deref] $rsp

    .loc    1 0 3 is_stmt 0         # loc.c:0:3
    leaq    4(%rsp), %rdi
    .loc    1 10 3 is_stmt 1        # loc.c:10:3
    callq   .Lf2$local

So, yeah, not sure why it's prematurely using a constant location, instead of actually using the register location for the whole duration there.

dwblaikie commented 4 years ago

Hey, what'd you know, we get my example wrong too - in a maybe slightly different way:

attribute((optnone)) attribute((pure)) int f1() { return 3; } attribute((optnone)) void f2(int* x) { } int main() { int i = 3; i = f1(); i = 7; f2(&i); }

Ltmp3:

DEBUG_VALUE: main:i <- 7

    .loc    1 8 5 prologue_end      # loc.c:8:5
    movl    $7, 4(%rsp)

.Ltmp4:

DEBUG_VALUE: main:i <- [DW_OP_plus_uconst 4, DW_OP_deref] $rsp

    .loc    1 0 5 is_stmt 0         # loc.c:0:5
    leaq    4(%rsp), %rdi
    .loc    1 9 3 is_stmt 1         # loc.c:9:3
    callq   .Lf2$local

For some reason we emit a constant 7 /before/ the store, which is unfortunate/incorrect. (though maybe useful in other cases where the store is sunk down to closer to its use? I guess that's what it's for & it's degenerate in this case where there are no instructions to sink it past)

dwblaikie commented 4 years ago

We can have a philosophical discussion on whether we should print 0 or the incremented value, but the reality is that showing a value at all here is confusing. I'd rather have this marked as optimized out. Jeremy or Adrian might have ideas/opinion.

Yeah - seems like an LLVM debug info quality bug.

One could imagine similar code:

x = 5; x = f1(); // pure call that can be discarded because the result is unused x = 10;

If you break on line 3 before the assignment/store is executed, it'd be especially misleading to report x as having the value 5 (might lead one to make all sorts of conclusions about what f1 did, if it returned 5). It should be reported as unknown. That would mean probably never observing the value 5 (because as soon as that store is executed it's "out of range" as we have notionally stepped past the f1() call immediately), which I think is OK/better than the alternative.

llvmbot commented 4 years ago

We can have a philosophical discussion on whether we should print 0 or the incremented value, but the reality is that showing a value at all here is confusing. I'd rather have this marked as optimized out. Jeremy or Adrian might have ideas/opinion.

pogo59 commented 4 years ago

At -O3 the do-nothing for loop has been eliminated; you will be able to determine this because in the optimized program, lldb will be unable to stop on line 5. Because the loop is gone, the value of l_16 is simply 0 throughout the function.

LLDB's behavior is correct here. Resolving as invalid.