llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.05k stars 11.98k forks source link

No inline debug info generated for small functions when optimizing #26004

Open dnovillo opened 8 years ago

dnovillo commented 8 years ago
Bugzilla Link 25630
Version trunk
OS All
Blocks llvm/llvm-project#30616
Attachments Test case, The same test case with more code in the inlined function
CC @adrian-prantl,@dwblaikie,@echristo,@hfinkel,@pogo59,@qcolombet,@sanjoy,@Melamoto

Extended Description

For one liner functions, LLVM seems to not be emitting inlined-at information in the final binary. The inlinedAt tags are there in the bytecode, but don't show up in the final binary.

If I change the code of the inlined function to have more than one line, then the inlinedAt markers survive to the final binary.

$ clang++ -O2 -g bug.cc -o bug $ llvm-dwarfdump --debug-dump=info bug|grep -A6 DW_TAG_inlined_subroutine

When compiled with GCC, the inlined function is marked properly:

$ g++-4.9 -O2 -g bug.cc -o gcc-bug $ llvm-dwarfdump --debug-dump=info gcc-bug|grep -A6 DW_TAG_inlined_subroutine 0x00000476: DW_TAG_inlined_subroutine [21] * DW_AT_abstract_origin [DW_FORM_ref4] (cu + 0x028d => {0x000003da} "_Z3sumxx") DW_AT_low_pc [DW_FORM_addr] (0x0000000000400559) DW_AT_high_pc [DW_FORM_data8] (0x0000000000000003) DW_AT_call_file [DW_FORM_data1] ("./bug.cc") DW_AT_call_line [DW_FORM_data1] (8) DW_AT_GNU_discriminator [DW_FORM_data1] (0x02)

If I modified the inlined function to have more code, LLVM will generate a DW_TAG_inlined_subroutine tag. I did this in the no-bug.cc test case:

$ clang++ -g -O2 no-bug.cc -o no-bug $ llvm-dwarfdump --debug-dump=info no-bug|grep -A6 DW_TAG_inlined_subroutine 0x000000b1: DW_TAG_inlined_subroutine [9] * DW_AT_abstract_origin [DW_FORM_ref4] (cu + 0x004c => {0x0000004c} "_Z3sumxx") DW_AT_low_pc [DW_FORM_addr] (0x000000000040063e) DW_AT_high_pc [DW_FORM_data4] (0x00000010) DW_AT_call_file [DW_FORM_data1] ("./no-bug.cc") DW_AT_call_line [DW_FORM_data1] (13) DW_AT_GNU_discriminator [DW_FORM_data1] (0x01)

adrian-prantl commented 2 years ago

mentioned in issue llvm/llvm-project#30616

qcolombet commented 7 years ago

So, I think there are two possible solutions to the problem: 1) LSR updates the debug locations, as my hack does, but maybe in a more systematic way. 2) expandCodeFor() gets another overload where it takes the value that is getting replaced (or equivalently its debugloc) and SCEVExpander sets the correct debug location accordingly.

Does this make sense? Ideas?

I think we discussed this on IRC some time back, but I think the variant of (2) that changes expandCodeFor to take a DebugLoc is the right path forward.

Agreed, #​2 seems sensible to me.

sanjoy commented 7 years ago

So, I think there are two possible solutions to the problem: 1) LSR updates the debug locations, as my hack does, but maybe in a more systematic way. 2) expandCodeFor() gets another overload where it takes the value that is getting replaced (or equivalently its debugloc) and SCEVExpander sets the correct debug location accordingly.

Does this make sense? Ideas?

I think we discussed this on IRC some time back, but I think the variant of (2) that changes expandCodeFor to take a DebugLoc is the right path forward.

llvmbot commented 7 years ago

I'm not familiar with neither LSR or the internals of SCEV, but I spent a decent amount of time stepping in the debugger today. So, yes, as Quentin pointed out SCEVExpander::expand() computes an insertion point for the SCEV object and assigns to the newly inserted instructions the DI of the instruction correspoding to the insertion point. I'm not entirely sure if there's an easy way to access the value we're trying to replace from inside SCEVExpander, but it's available in the caller stack (LF.OperandValToReplace).

As a quick hack, I tried to force the debug location to be set correctly for this case and it seems to work, but I'm worried this is too specific to my case.

diff --git a/lib/Transforms/Scalar/LoopStrengthReduce.cpp b/lib/Transforms/Scalar/LoopStrengthReduce.cpp index a61f646..7019d56 100644 --- a/lib/Transforms/Scalar/LoopStrengthReduce.cpp +++ b/lib/Transforms/Scalar/LoopStrengthReduce.cpp @@ -4576,6 +4576,11 @@ Value LSRInstance::Expand(const LSRUse &LU, // unfolded offsets. LSR assumes they both live next to their uses. if (!Ops.empty()) { Value FullV = Rewriter.expandCodeFor(SE.getAddExpr(Ops), Ty);

So, I think there are two possible solutions to the problem: 1) LSR updates the debug locations, as my hack does, but maybe in a more systematic way. 2) expandCodeFor() gets another overload where it takes the value that is getting replaced (or equivalently its debugloc) and SCEVExpander sets the correct debug location accordingly.

Does this make sense? Ideas?

sanjoy commented 7 years ago

It sounds like a job for the SCEVExpander at first glance.

AFAICT, LSR is not playing with the debug locations and the SCEVExpander should have access to the Value, hence the debug location, related to the different expressions it expands. Right now, the expander uses the debug location of the insertion point instead of the original value, if I am not mistaken.

Is it possible to actually get to (the debug location of the) original value from within SCEVExpander? I'd think the transform that is requesting an expansion would have a better idea of what the debug location should be.

I am not familiar with the internals of SCEV though.

Sanjoy, what's your take on this?

qcolombet commented 7 years ago

It sounds like a job for the SCEVExpander at first glance.

AFAICT, LSR is not playing with the debug locations and the SCEVExpander should have access to the Value, hence the debug location, related to the different expressions it expands. Right now, the expander uses the debug location of the insertion point instead of the original value, if I am not mistaken.

I am not familiar with the internals of SCEV though.

Sanjoy, what's your take on this?

llvmbot commented 7 years ago

reduced testcase

llvmbot commented 7 years ago

I have a theory now. Please ignore my previous reduction, I reduced the testcase too much. After another (painful) reduction, I think I identified the pass which introduces the problem (see new attached testcase), that is LoopStrengthReduce (which is run as part of llc -O1 and beyond). Before LoopStrengthReduce runs the IR looks like:

%add.i = add nsw i32 %inc5, %s.04, !dbg !​32 tail call void @​llvm.dbg.value(metadata i32 %add.i, i64 0, metadata !​22, metadata !​11), !dbg !​23 %.add.i = select i1 %cmp1, i32 30, i32 %add.i, !dbg !​33

where !​32 = !DILocation(line: 1, column: 34, scope: !​6, inlinedAt: !​29) !​33 = !DILocation(line: 6, column: 9, scope: !​17)

So the add instruction contains the inlinedAt info and it points which points to the correct location in my reduced *.cc testcase: !​29 = distinct !DILocation(line: 6, column: 23, scope: !​30) that is -> sum(i, s)

and the select instruction points to the i inside the if (i != 100)

During LoopStrengthReduce the add instruction is replaced, so the IR looks like this:

tail call void @​llvm.dbg.value(metadata !​2, i64 0, metadata !​22, metadata !​11), !dbg !​23 %2 = add i32 %.add.i, %lsr.iv, !dbg !​32 %3 = add i32 %2, -3, !dbg !​32 %.add.i.1 = select i1 %cmp1.1, i32 30, i32 %3, !dbg !​32

where !​32 = !DILocation(line: 6, column: 9, scope: !​17)

so, both the add and the select point to the i inside the if i != 100

Stepping through with the debuggger reveals that when the Expander runs it inserts the add instructions assigning the debug metadata of select (i.e. !​33 before LSR, which becomes !​32 after renaming the metadata because the original add is dead).

I'm not entirely sure what's the best solution here, so I'm asking Quentin. I think that LSR (or SCEV) should (if possible) look at assigning the correct metadata when creating the new add instructions (and/or preserving the inlineAt info).

llvmbot commented 7 years ago

$ cat run.sh

!/bin/sh

rm -rf bugopt bugopt.o echo "Result at O0" /home/davide/work/llvm/build-clang/bin/llc -O0 $1 -o bugopt /home/davide/work/llvm/build-clang/bin/llvm-mc -filetype=obj bugopt -o bugopt.o /home/davide/work/llvm/build-clang/bin/llvm-dwarfdump --debug-dump=info bugopt.o |grep -A6 DW_TAG_inlined_subroutine

echo "Result at 01" rm -rf bugopt bugopt.o /home/davide/work/llvm/build-clang/bin/llc -O1 $1 -o bugopt /home/davide/work/llvm/build-clang/bin/llvm-mc -filetype=obj bugopt -o bugopt.o /home/davide/work/llvm/build-clang/bin/llvm-dwarfdump --debug-dump=info bugopt.o |grep -A6 DW_TAG_inlined_subroutine

$ ./run.sh reduced.ll Result at O0 0x00000040: DW_TAG_inlined_subroutine [4] DW_AT_abstract_origin [DW_FORM_ref4] (cu + 0x002a => {0x0000002a} "sum") DW_AT_low_pc [DW_FORM_addr] (0x0000000000000005) DW_AT_high_pc [DW_FORM_data4] (0x00000002) DW_AT_call_file [DW_FORM_data1] ("/home/davide/work/llvm/build-clang/bin/bug.cc") DW_AT_call_line [DW_FORM_data1] (6)

Result at 01 $

Taking this, trying to understand which backend pass destroys the information.

llvmbot commented 7 years ago

It's definitely a problem in the backend. Fully reduced testcase:

define i32 @​main() !dbg !​3 { %add.i.4 = add nsw i32 undef, undef, !dbg !​6 %.add.i.4 = select i1 undef, i32 30, i32 %add.i.4, !dbg !​9 ret i32 %.add.i.4, !dbg !​10 } !llvm.dbg.cu = !{#0} !llvm.module.flags = !{#2} !​0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !​1, producer: "clang version 4.0.0 (trunk 288665) (llvm/trunk 288759)", isOptimized: false, runtimeVersion: 0, emissionKind: LineTablesOnly) !​1 = !DIFile(filename: "bug.cc", directory: "/home/davide/work/llvm/build-clang/bin") !​2 = !{i32 2, !"Debug Info Version", i32 3} !​3 = distinct !DISubprogram(name: "main", scope: !​1, file: !​1, line: 3, type: !​4, isLocal: false, isDefinition: true, scopeLine: 3, flags: DIFlagPrototyped, isOptimized: false, unit: !​0) !​4 = !DISubroutineType(types: !​5) !​5 = !{} !​6 = !DILocation(line: 1, column: 34, scope: !​7, inlinedAt: !​8) !​7 = distinct !DISubprogram(name: "sum", scope: !​1, file: !​1, line: 1, type: !​4, isLocal: false, isDefinition: true, scopeLine: 1, flags: DIFlagPrototyped, isOptimized: false, unit: !​0) !​8 = distinct !DILocation(line: 6, column: 23, scope: !​3) !​9 = !DILocation(line: 6, column: 9, scope: !​3) !​10 = !DILocation(line: 7, column: 3, scope: !​3)

llvmbot commented 7 years ago

Slightly simpler testcase, removing the dependency on printf

int sum(int x, int y) { return x + y; }

int main() { int s, i = 0; while (i++ < 400) if (i != 100) s = sum(i, s); else s = 30; return s; }

dwblaikie commented 8 years ago

With some changes this seemed to repro at -O1 for me. Still get an add instruction that looks like it's sufficiently similar to the add in the inlined function that it should be annotated as such.

In the O1 IR, the add instruction appears to still have the right location. So I guess this is somewhere down in selection dag or other backend-y optimizations.

dwblaikie commented 8 years ago

This is likely a debug location optimization problem. It would affect inlining or without inlining.

If anyone wants to look into this, just watch which optimization removes the last use of the DebugLoc with the inlinedAt info. Chances are two instructions, one inside and one outside the inlined function, might get coalesced into some other operation and the outer instruction's location is chosen. Then we might split it back out (there was a bug filed recently about switch lowering doing something like this, maybe?) into multiple instructions in the backend without the extra information available anymore.

I'll see if I can reduce this a bit further...

dnovillo commented 8 years ago

assigned to @Melamoto

llvmbot commented 4 months ago

@llvm/issue-subscribers-debuginfo

Author: Diego Novillo (dnovillo)

| | | | --- | --- | | Bugzilla Link | [25630](https://llvm.org/bz25630) | | Version | trunk | | OS | All | | Blocks | llvm/llvm-project#30616 | | Attachments | [Test case](https://user-images.githubusercontent.com/92601372/143752507-91cbac28-00fa-47ee-80ce-1f3aa6392783.gz), [The same test case with more code in the inlined function](https://user-images.githubusercontent.com/92601372/143752508-f8fd2253-e713-444e-bebc-af7d6f4a30f0.gz) | | CC | @adrian-prantl,@dwblaikie,@echristo,@hfinkel,@pogo59,@qcolombet,@sanjoy,@Melamoto | ## Extended Description For one liner functions, LLVM seems to not be emitting inlined-at information in the final binary. The inlinedAt tags are there in the bytecode, but don't show up in the final binary. If I change the code of the inlined function to have more than one line, then the inlinedAt markers survive to the final binary. $ clang++ -O2 -g bug.cc -o bug $ llvm-dwarfdump --debug-dump=info bug|grep -A6 DW_TAG_inlined_subroutine When compiled with GCC, the inlined function is marked properly: $ g++-4.9 -O2 -g bug.cc -o gcc-bug $ llvm-dwarfdump --debug-dump=info gcc-bug|grep -A6 DW_TAG_inlined_subroutine 0x00000476: DW_TAG_inlined_subroutine [21] * DW_AT_abstract_origin [DW_FORM_ref4] (cu + 0x028d => {0x000003da} "_Z3sumxx") DW_AT_low_pc [DW_FORM_addr] (0x0000000000400559) DW_AT_high_pc [DW_FORM_data8] (0x0000000000000003) DW_AT_call_file [DW_FORM_data1] ("./bug.cc") DW_AT_call_line [DW_FORM_data1] (8) DW_AT_GNU_discriminator [DW_FORM_data1] (0x02) If I modified the inlined function to have more code, LLVM will generate a DW_TAG_inlined_subroutine tag. I did this in the no-bug.cc test case: $ clang++ -g -O2 no-bug.cc -o no-bug $ llvm-dwarfdump --debug-dump=info no-bug|grep -A6 DW_TAG_inlined_subroutine 0x000000b1: DW_TAG_inlined_subroutine [9] * DW_AT_abstract_origin [DW_FORM_ref4] (cu + 0x004c => {0x0000004c} "_Z3sumxx") DW_AT_low_pc [DW_FORM_addr] (0x000000000040063e) DW_AT_high_pc [DW_FORM_data4] (0x00000010) DW_AT_call_file [DW_FORM_data1] ("./no-bug.cc") DW_AT_call_line [DW_FORM_data1] (13) DW_AT_GNU_discriminator [DW_FORM_data1] (0x01)