Open dnovillo opened 8 years ago
mentioned in issue llvm/llvm-project#30616
So, I think there are two possible solutions to the problem: 1) LSR updates the debug locations, as my hack does, but maybe in a more systematic way. 2) expandCodeFor() gets another overload where it takes the value that is getting replaced (or equivalently its debugloc) and SCEVExpander sets the correct debug location accordingly.
Does this make sense? Ideas?
I think we discussed this on IRC some time back, but I think the variant of (2) that changes expandCodeFor to take a DebugLoc is the right path forward.
Agreed, #2 seems sensible to me.
So, I think there are two possible solutions to the problem: 1) LSR updates the debug locations, as my hack does, but maybe in a more systematic way. 2) expandCodeFor() gets another overload where it takes the value that is getting replaced (or equivalently its debugloc) and SCEVExpander sets the correct debug location accordingly.
Does this make sense? Ideas?
I think we discussed this on IRC some time back, but I think the variant of (2) that changes expandCodeFor to take a DebugLoc is the right path forward.
I'm not familiar with neither LSR or the internals of SCEV, but I spent a decent amount of time stepping in the debugger today. So, yes, as Quentin pointed out SCEVExpander::expand() computes an insertion point for the SCEV object and assigns to the newly inserted instructions the DI of the instruction correspoding to the insertion point. I'm not entirely sure if there's an easy way to access the value we're trying to replace from inside SCEVExpander, but it's available in the caller stack (LF.OperandValToReplace).
As a quick hack, I tried to force the debug location to be set correctly for this case and it seems to work, but I'm worried this is too specific to my case.
diff --git a/lib/Transforms/Scalar/LoopStrengthReduce.cpp b/lib/Transforms/Scalar/LoopStrengthReduce.cpp index a61f646..7019d56 100644 --- a/lib/Transforms/Scalar/LoopStrengthReduce.cpp +++ b/lib/Transforms/Scalar/LoopStrengthReduce.cpp @@ -4576,6 +4576,11 @@ Value LSRInstance::Expand(const LSRUse &LU, // unfolded offsets. LSR assumes they both live next to their uses. if (!Ops.empty()) { Value FullV = Rewriter.expandCodeFor(SE.getAddExpr(Ops), Ty);
So, I think there are two possible solutions to the problem: 1) LSR updates the debug locations, as my hack does, but maybe in a more systematic way. 2) expandCodeFor() gets another overload where it takes the value that is getting replaced (or equivalently its debugloc) and SCEVExpander sets the correct debug location accordingly.
Does this make sense? Ideas?
It sounds like a job for the SCEVExpander at first glance.
AFAICT, LSR is not playing with the debug locations and the SCEVExpander should have access to the Value, hence the debug location, related to the different expressions it expands. Right now, the expander uses the debug location of the insertion point instead of the original value, if I am not mistaken.
Is it possible to actually get to (the debug location of the) original value from within SCEVExpander? I'd think the transform that is requesting an expansion would have a better idea of what the debug location should be.
I am not familiar with the internals of SCEV though.
Sanjoy, what's your take on this?
It sounds like a job for the SCEVExpander at first glance.
AFAICT, LSR is not playing with the debug locations and the SCEVExpander should have access to the Value, hence the debug location, related to the different expressions it expands. Right now, the expander uses the debug location of the insertion point instead of the original value, if I am not mistaken.
I am not familiar with the internals of SCEV though.
Sanjoy, what's your take on this?
I have a theory now. Please ignore my previous reduction, I reduced the testcase too much. After another (painful) reduction, I think I identified the pass which introduces the problem (see new attached testcase), that is LoopStrengthReduce (which is run as part of llc -O1 and beyond). Before LoopStrengthReduce runs the IR looks like:
%add.i = add nsw i32 %inc5, %s.04, !dbg !32 tail call void @llvm.dbg.value(metadata i32 %add.i, i64 0, metadata !22, metadata !11), !dbg !23 %.add.i = select i1 %cmp1, i32 30, i32 %add.i, !dbg !33
where !32 = !DILocation(line: 1, column: 34, scope: !6, inlinedAt: !29) !33 = !DILocation(line: 6, column: 9, scope: !17)
So the add
instruction contains the inlinedAt info and it points which points to the correct location in my reduced *.cc testcase:
!29 = distinct !DILocation(line: 6, column: 23, scope: !30)
that is -> sum(i, s)
and the select
instruction points to the i
inside the if (i != 100)
During LoopStrengthReduce the add
instruction is replaced, so the IR looks like this:
tail call void @llvm.dbg.value(metadata !2, i64 0, metadata !22, metadata !11), !dbg !23 %2 = add i32 %.add.i, %lsr.iv, !dbg !32 %3 = add i32 %2, -3, !dbg !32 %.add.i.1 = select i1 %cmp1.1, i32 30, i32 %3, !dbg !32
where !32 = !DILocation(line: 6, column: 9, scope: !17)
so, both the add
and the select
point to the i
inside the if i != 100
Stepping through with the debuggger reveals that when the Expander runs it inserts the add
instructions assigning the debug metadata of select
(i.e. !33 before LSR, which becomes !32 after renaming the metadata because the original add is dead).
I'm not entirely sure what's the best solution here, so I'm asking Quentin.
I think that LSR (or SCEV) should (if possible) look at assigning the correct metadata when creating the new add instructions (and/or preserving the inlineAt
info).
$ cat run.sh
rm -rf bugopt bugopt.o echo "Result at O0" /home/davide/work/llvm/build-clang/bin/llc -O0 $1 -o bugopt /home/davide/work/llvm/build-clang/bin/llvm-mc -filetype=obj bugopt -o bugopt.o /home/davide/work/llvm/build-clang/bin/llvm-dwarfdump --debug-dump=info bugopt.o |grep -A6 DW_TAG_inlined_subroutine
echo "Result at 01" rm -rf bugopt bugopt.o /home/davide/work/llvm/build-clang/bin/llc -O1 $1 -o bugopt /home/davide/work/llvm/build-clang/bin/llvm-mc -filetype=obj bugopt -o bugopt.o /home/davide/work/llvm/build-clang/bin/llvm-dwarfdump --debug-dump=info bugopt.o |grep -A6 DW_TAG_inlined_subroutine
$ ./run.sh reduced.ll Result at O0 0x00000040: DW_TAG_inlined_subroutine [4] DW_AT_abstract_origin [DW_FORM_ref4] (cu + 0x002a => {0x0000002a} "sum") DW_AT_low_pc [DW_FORM_addr] (0x0000000000000005) DW_AT_high_pc [DW_FORM_data4] (0x00000002) DW_AT_call_file [DW_FORM_data1] ("/home/davide/work/llvm/build-clang/bin/bug.cc") DW_AT_call_line [DW_FORM_data1] (6)
Result at 01 $
Taking this, trying to understand which backend pass destroys the information.
It's definitely a problem in the backend. Fully reduced testcase:
define i32 @main() !dbg !3 { %add.i.4 = add nsw i32 undef, undef, !dbg !6 %.add.i.4 = select i1 undef, i32 30, i32 %add.i.4, !dbg !9 ret i32 %.add.i.4, !dbg !10 } !llvm.dbg.cu = !{#0} !llvm.module.flags = !{#2} !0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !1, producer: "clang version 4.0.0 (trunk 288665) (llvm/trunk 288759)", isOptimized: false, runtimeVersion: 0, emissionKind: LineTablesOnly) !1 = !DIFile(filename: "bug.cc", directory: "/home/davide/work/llvm/build-clang/bin") !2 = !{i32 2, !"Debug Info Version", i32 3} !3 = distinct !DISubprogram(name: "main", scope: !1, file: !1, line: 3, type: !4, isLocal: false, isDefinition: true, scopeLine: 3, flags: DIFlagPrototyped, isOptimized: false, unit: !0) !4 = !DISubroutineType(types: !5) !5 = !{} !6 = !DILocation(line: 1, column: 34, scope: !7, inlinedAt: !8) !7 = distinct !DISubprogram(name: "sum", scope: !1, file: !1, line: 1, type: !4, isLocal: false, isDefinition: true, scopeLine: 1, flags: DIFlagPrototyped, isOptimized: false, unit: !0) !8 = distinct !DILocation(line: 6, column: 23, scope: !3) !9 = !DILocation(line: 6, column: 9, scope: !3) !10 = !DILocation(line: 7, column: 3, scope: !3)
Slightly simpler testcase, removing the dependency on printf
int sum(int x, int y) { return x + y; }
int main() { int s, i = 0; while (i++ < 400) if (i != 100) s = sum(i, s); else s = 30; return s; }
With some changes this seemed to repro at -O1 for me. Still get an add instruction that looks like it's sufficiently similar to the add in the inlined function that it should be annotated as such.
In the O1 IR, the add instruction appears to still have the right location. So I guess this is somewhere down in selection dag or other backend-y optimizations.
This is likely a debug location optimization problem. It would affect inlining or without inlining.
If anyone wants to look into this, just watch which optimization removes the last use of the DebugLoc with the inlinedAt info. Chances are two instructions, one inside and one outside the inlined function, might get coalesced into some other operation and the outer instruction's location is chosen. Then we might split it back out (there was a bug filed recently about switch lowering doing something like this, maybe?) into multiple instructions in the backend without the extra information available anymore.
I'll see if I can reduce this a bit further...
assigned to @Melamoto
@llvm/issue-subscribers-debuginfo
Author: Diego Novillo (dnovillo)
Extended Description
For one liner functions, LLVM seems to not be emitting inlined-at information in the final binary. The inlinedAt tags are there in the bytecode, but don't show up in the final binary.
If I change the code of the inlined function to have more than one line, then the inlinedAt markers survive to the final binary.
$ clang++ -O2 -g bug.cc -o bug $ llvm-dwarfdump --debug-dump=info bug|grep -A6 DW_TAG_inlined_subroutine
When compiled with GCC, the inlined function is marked properly:
$ g++-4.9 -O2 -g bug.cc -o gcc-bug $ llvm-dwarfdump --debug-dump=info gcc-bug|grep -A6 DW_TAG_inlined_subroutine 0x00000476: DW_TAG_inlined_subroutine [21] * DW_AT_abstract_origin [DW_FORM_ref4] (cu + 0x028d => {0x000003da} "_Z3sumxx") DW_AT_low_pc [DW_FORM_addr] (0x0000000000400559) DW_AT_high_pc [DW_FORM_data8] (0x0000000000000003) DW_AT_call_file [DW_FORM_data1] ("./bug.cc") DW_AT_call_line [DW_FORM_data1] (8) DW_AT_GNU_discriminator [DW_FORM_data1] (0x02)
If I modified the inlined function to have more code, LLVM will generate a DW_TAG_inlined_subroutine tag. I did this in the no-bug.cc test case:
$ clang++ -g -O2 no-bug.cc -o no-bug $ llvm-dwarfdump --debug-dump=info no-bug|grep -A6 DW_TAG_inlined_subroutine 0x000000b1: DW_TAG_inlined_subroutine [9] * DW_AT_abstract_origin [DW_FORM_ref4] (cu + 0x004c => {0x0000004c} "_Z3sumxx") DW_AT_low_pc [DW_FORM_addr] (0x000000000040063e) DW_AT_high_pc [DW_FORM_data4] (0x00000010) DW_AT_call_file [DW_FORM_data1] ("./no-bug.cc") DW_AT_call_line [DW_FORM_data1] (13) DW_AT_GNU_discriminator [DW_FORM_data1] (0x01)