Open OCHyams opened 3 years ago
Oops, I said:
If you run this example through a debugger you will see "first=5" on line 6 when you'd expect "first=20". but the example isn't linkable and doesn't include a call to the troublesome function. Here is the executable reproducer which gives the necessary context:
$ cat test.c attribute((optnone)) void esc(int* p) {} attribute((optnone)) void fluff() {} attribute((noinline)) int fun(int first, int second) { if (first) first = second; fluff(); esc(&first); return 0; } int main() { return fun(5, 20); }
$ clang -O2 -g test.c -o test
$ lldb test (lldb) target create "test" Current executable set to 'test' (x86_64). (lldb) b fun Breakpoint 1: where = test`fun + 1 at test.c:4:7, address = 0x00000000004004a1 (lldb) run Process 13431 launched: 'test' (x86_64) Process 13431 stopped
Notice that we see first=5 on line 6, which is wrong. We should see first=20 because the condition (first != 0) guarding the assignment (first = second) was true.
I've written a few of these cases as dexter tests in D94761. unused-merged-value.c in that patch demonstrates how we are able to run into this issue for non-escaped variables too.
@llvm/issue-subscribers-debuginfo
Author: Orlando Cazalet-Hyams (OCHyams)
@OCHyams Can we close this now that assignment-tracking exists? LTO-mode notwithstanding
Hmm yes and no. The more fundamental issue has been fixed but it looks like issue that SimplifyCFG introduces still exists
https://godbolt.org/z/3xP5qjx73
define dso_local noundef i32 @fun(int, int)(i32 noundef %first, i32 noundef %second) local_unnamed_addr #0 !dbg !10 {
entry:
%first.addr = alloca i32, align 4, !DIAssignID !18
#dbg_assign(i1 undef, !16, !DIExpression(), !18, ptr %first.addr, !DIExpression(), !19)
#dbg_assign(i32 %first, !16, !DIExpression(), !20, ptr %first.addr, !DIExpression(), !19)
#dbg_value(i32 %second, !17, !DIExpression(), !19)
%tobool.not = icmp eq i32 %first, 0, !dbg !21
; XXX
#dbg_assign(i32 %second, !16, !DIExpression(), !20, ptr %first.addr, !DIExpression(), !19)
%spec.select = select i1 %tobool.not, i32 0, i32 %second, !dbg !23
store i32 %spec.select, ptr %first.addr, align 4, !tbaa !24, !DIAssignID !20
tail call void @fluff()(), !dbg !28
call void @esc(int*)(ptr noundef nonnull %first.addr), !dbg !29
ret i32 0, !dbg !30
}
I think the dbg.assign after ; XXX
wants to either have spec.select
or undef
as its value operand. The dbg.assign is linked to the store (good) meaning we'll produce a stack location from there, but between the dbg.assign and that store we'll emit an implicit location of the value %second, which is wrong (conditional assignment has become unconditional in debug info).
Extended Description
TL;DR
llvm's debug info model is ineffective for variables which cannot be promoted in the first round of SROA/mem2reg and we can see incorrect debug info being generated as a result.
Example
clang built at 1ecae1e62ad0 (10th Jan 2021).
We can get incorrect debug info for variables which are promoted (or partially promoted) after InstCombine runs LowerDbgDeclare. I think this is a general problem with the debug info model which manifests in more than one place (see summary at end), but here is a specific example first:
Compile it with the following command and look at the IR:
Notice that the second dbg.value for "first" (#12) unexpectedly comes after the call to "fluff()". If you run this example through a debugger you will see "first=5" on line 6 when you'd expect "first=20". The dbg.value should instead be positioned immediately after the select instruction.
Example walkthrough
In the example "first" is escaped so it is not promoted by the early SROA/mem2reg pass. InstCombine will partially promote it by merging the stores (the initial store and the store on the if-then branch) into their common successor by inserting a PHI and store. Later SimplifyCFG folds all of the blocks together, turning the PHI into a select. InstCombine doesn't insert a dbg.value after the PHI and SimplifyCFG doesn't insert one when converting the PHI to a select, so the merged value is untracked in debug info.
We can see what happened by looking at the -print-after-all output. SROA/mem2reg runs early and is not able to promote "first" because it is escaped. So "first"'s alloca and dbg.declare survive:
Later, InstCombine runs and before doing any optimisations it calls LowerDbgDeclare to convert the dbg.declare into a set of dbg.value intrinsics:
InstCombine merges the two stores to %first.addr ("first"'s alloca address) into the common successor "if.end" as a PHI and store. Notice how it does not insert a dbg.value to describe the merged value %storemerge.
SimplifyCFG folds these blocks together, converting the PHI to a select, and leaves us with the incorrect debug info shown earlier:
Summary
This example shows that InstCombine and SimplifyCFG compose to produce incorrect debug info for escaped variables. However, the reason that I think this is a general problem rather than a specific bug, and why I am struggling to hold any one pass accountable, is because of the following inconsistencies:
If "first" in this example can be fully promoted by mem2reg (https://godbolt.org/z/Es89z7) then there is no issue because mem2reg will insert a dbg.value after the PHI which eventually becomes the select.
However, mem2reg can also cause the same bug because it only inserts dbg.values when promoting a variable if and only if it has a dbg.declare. It's possible for LowerDbgDeclare to remove the dbg.declare before mem2reg runs. For example, when a variable is escaped but the escaping function is later inlined. You can see this happening here https://godbolt.org/z/KrdjGs.
Lastly, we don't see this issue in any of these cases if there isn't any block folding because LiveDebugValues will merge live-out variable locations from preds (https://godbolt.org/z/sjcnbv).
As far as I can tell there are no rules or examples demonstrating how debug info should be updated when promoting or partially promoting variables after LowerDbgDeclare has removed the dbg.declare.