llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.37k stars 12.14k forks source link

clang 18.1.8 generates incorrect debug info for double precison float parameter at optimisation level -O1 (ARM cortex-m4, soft float target) #110482

Open Rick-Brewin opened 2 months ago

Rick-Brewin commented 2 months ago

Self-contained reproduction C source code attached as a text file: repro.c.txt

Building the above source with the following compiler options:

-mcpu=cortex-m4 -mthumb -target arm-none-eabi -fshort-enums -fsigned-char -gdwarf-4 -fomit-frame-pointer -g -O1 -mfloat-abi=soft -save-temps

.... generates an incorrect register pair (R2,R4) in the debug info describing the location of parameter "ian" in function "lets_go()". While the debug experience at optimisation levels higher than -O0 can be fragmented, the compiler should prefer to output no location info at all instead of misleading location info. The debugger (in this case GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.2) 9.2) has no way to detect that the location description is inaccurate.

GDB output, connecting to qemu-system-gnuarmeclipse 2.8.0:

Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from repro.elf...
(gdb) b lets_go
Breakpoint 1 at 0x106: file repro.c, line 14.
(gdb) target remote localhost:12345
Remote debugging using localhost:12345
main () at repro.c:4
4       {
(gdb) c
Continuing.

Breakpoint 1, lets_go (phil=<optimized out>, grant=1.1754920000000001e-308, dot=<optimized out>, ethel=<optimized out>, ian=1.1754909290959615e-308) at repro.c:14
14          double pat = ethel + ian;  /* first natural breakpoint in lets_go() is here; ian has incorrect register location description at -O1 */
(gdb) 

As we can see, parameter "ian" is described as having the value 1.1754909290959615e-308, but the correct value is 128966e+150.

Generated debug information, collected using objdump -W:

Variable description for various parameters, including 'ian':

<1><7e>: Abbrev Number: 3 (DW_TAG_subprogram)
   <7f>   DW_AT_low_pc      : 0xf0
   <83>   DW_AT_high_pc     : 0xd0
   <87>   DW_AT_frame_base  : 1 byte block: 5d         (DW_OP_reg13 (r13))
   <89>   DW_AT_GNU_all_call_sites: 1
   <89>   DW_AT_name        : (indirect string, offset: 0x1c4): lets_go
   <8d>   DW_AT_decl_file   : 1
   <8e>   DW_AT_decl_line   : 11
   <8f>   DW_AT_prototyped  : 1
   <8f>   DW_AT_type        : <0xf9>
   <93>   DW_AT_external    : 1
<2><93>: Abbrev Number: 5 (DW_TAG_formal_parameter)
   <94>   DW_AT_location    : 0x91 (location list)
   <98>   DW_AT_name        : (indirect string, offset: 0x276): phil
   <9c>   DW_AT_decl_file   : 1
   <9d>   DW_AT_decl_line   : 11
   <9e>   DW_AT_type        : <0xf9>
<2><a2>: Abbrev Number: 5 (DW_TAG_formal_parameter)
   <a3>   DW_AT_location    : 0xc7 (location list)
   <a7>   DW_AT_name        : (indirect string, offset: 0x21b): grant
   <ab>   DW_AT_decl_file   : 1
   <ac>   DW_AT_decl_line   : 11
   <ad>   DW_AT_type        : <0xf9>
<2><b1>: Abbrev Number: 5 (DW_TAG_formal_parameter)
   <b2>   DW_AT_location    : 0xee (location list)
   <b6>   DW_AT_name        : (indirect string, offset: 0xfc): dot
   <ba>   DW_AT_decl_file   : 1
   <bb>   DW_AT_decl_line   : 11
   <bc>   DW_AT_type        : <0xf9>
<2><c0>: Abbrev Number: 6 (DW_TAG_formal_parameter)
   <c1>   DW_AT_name        : (indirect string, offset: 0x397): ethel
   <c5>   DW_AT_decl_file   : 1
   <c6>   DW_AT_decl_line   : 11
   <c7>   DW_AT_type        : <0xf9>
<2><cb>: Abbrev Number: 5 (DW_TAG_formal_parameter)
   <cc>   DW_AT_location    : 0x0 (location list)
   <d0>   DW_AT_name        : (indirect string, offset: 0x5af): ian
   <d4>   DW_AT_decl_file   : 1
   <d5>   DW_AT_decl_line   : 11
   <d6>   DW_AT_type        : <0xf9>

Location list info for parameter "ian":

Contents of the .debug_loc section:

    Offset   Begin            End              Expression
    00000000 000000f8 000000fe (DW_OP_piece: 4; DW_OP_reg3 (r3); DW_OP_piece: 4)
    0000000f 000000fe 00000104 (DW_OP_piece: 4; DW_OP_reg4 (r4); DW_OP_piece: 4)
    0000001e 00000104 0000010c (DW_OP_reg2 (r2); DW_OP_piece: 4; DW_OP_reg4 (r4); DW_OP_piece: 4)
    0000002e 0000010c 0000011c (DW_OP_piece: 4; DW_OP_reg4 (r4); DW_OP_piece: 4)
    0000003d 0000011c 00000120 (DW_OP_reg2 (r2); DW_OP_piece: 4; DW_OP_reg4 (r4); DW_OP_piece: 4)
    0000004d 00000120 00000128 (DW_OP_reg2 (r2); DW_OP_piece: 4; DW_OP_reg3 (r3); DW_OP_piece: 4)
    0000005d <End of list>

Relevant location at breakpoint (PC 0x106):

    0000001e 00000104 0000010c (DW_OP_reg2 (r2); DW_OP_piece: 4; DW_OP_reg4 (r4); DW_OP_piece: 4)

This incorrectly states that 'ian' is located in registers r2 and r4.

The generated assembly for the beginning of lets_go (harvested from -save-temps) states:

lets_go:
.Lfunc_begin1:
        .loc    1 12 0 is_stmt 1                @ repro.c:12:0
        .fnstart
        .cfi_startproc
@ %bb.0:
        .save   {r4, r5, r6, r7, r8, r9, r10, r11, lr}
        push.w  {r4, r5, r6, r7, r8, r9, r10, r11, lr}
        .cfi_def_cfa_offset 36
        .cfi_offset lr, -4
        .cfi_offset r11, -8
        .cfi_offset r10, -12
        .cfi_offset r9, -16
        .cfi_offset r8, -20
        .cfi_offset r7, -24
        .cfi_offset r6, -28
        .cfi_offset r5, -32
        .cfi_offset r4, -36
        .pad    #4
        sub     sp, #4
        .cfi_def_cfa_offset 40
        mov     r10, r1
.Ltmp5:
        @DEBUG_VALUE: lets_go:ian <- [DW_OP_LLVM_fragment 32 32] $r3
        ldrd    r7, r1, [sp, #48]
.Ltmp6:
        @DEBUG_VALUE: lets_go:sharon <- 5.990000e-200
        @DEBUG_VALUE: lets_go:ethel <- undef
        mov     r4, r3
.Ltmp7:
        @DEBUG_VALUE: lets_go:ian <- [DW_OP_LLVM_fragment 32 32] $r4
        mov     r5, r2
        ldrd    r2, r3, [sp, #56]
        @DEBUG_VALUE: lets_go:phil <- [DW_OP_LLVM_fragment 0 32] undef
.Ltmp8:
        @DEBUG_VALUE: lets_go:phil <- [DW_OP_LLVM_fragment 32 32] $r10
        @DEBUG_VALUE: lets_go:grant <- [DW_OP_LLVM_fragment 0 32] $r5
        @DEBUG_VALUE: lets_go:grant <- [DW_OP_LLVM_fragment 32 32] $r4
        @DEBUG_VALUE: lets_go:dot <- [DW_OP_LLVM_fragment 0 32] undef
        @DEBUG_VALUE: lets_go:dot <- [DW_OP_LLVM_fragment 32 32] undef
        @DEBUG_VALUE: lets_go:ian <- [DW_OP_LLVM_fragment 0 32] $r2
        mov     r11, r0
.Ltmp9:
        .loc    1 14 24 prologue_end            @ repro.c:14:24
        mov     r0, r7

Corresponding disassembly up to the breakpoint site (PC 0x106):

000000f0 <lets_go>:
 f0:   e92d 4ff0       stmdb   sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
 f4:   b081            sub     sp, #4
 f6:   468a            mov     sl, r1
 f8:   e9dd 710c       ldrd    r7, r1, [sp, #48]       ; 0x30
 fc:   461c            mov     r4, r3
 fe:   4615            mov     r5, r2
100:   e9dd 230e       ldrd    r2, r3, [sp, #56]       ; 0x38
104:   4683            mov     fp, r0
106:   4638            mov     r0, r7

Please let me know if any other information would be useful to reproduce this.

Rick-Brewin commented 2 months ago

Suggest adding "arm-backend" and "debuginfo" as labels for this issue in the first instance, although I have not experimented with building for different targets.

llvmbot commented 2 months ago

@llvm/issue-subscribers-debuginfo

Author: Rick (Rick-Brewin)

Self-contained reproduction C source code attached as a text file: [repro.c.txt](https://github.com/user-attachments/files/17188039/repro.c.txt) Building the above source with the following compiler options: `-mcpu=cortex-m4 -mthumb -target arm-none-eabi -fshort-enums -fsigned-char -gdwarf-4 -fomit-frame-pointer -g -O1 -mfloat-abi=soft -save-temps` .... generates an incorrect register pair (R2,R4) in the debug info describing the location of parameter "ian" in function "lets_go()". While the debug experience at optimisation levels higher than -O0 can be fragmented, the compiler should prefer to output no location info at all instead of misleading location info. The debugger (in this case GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.2) 9.2) has no way to detect that the location description is inaccurate. GDB output, connecting to qemu-system-gnuarmeclipse 2.8.0: ``` Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from repro.elf... (gdb) b lets_go Breakpoint 1 at 0x106: file repro.c, line 14. (gdb) target remote localhost:12345 Remote debugging using localhost:12345 main () at repro.c:4 4 { (gdb) c Continuing. Breakpoint 1, lets_go (phil=<optimized out>, grant=1.1754920000000001e-308, dot=<optimized out>, ethel=<optimized out>, ian=1.1754909290959615e-308) at repro.c:14 14 double pat = ethel + ian; /* first natural breakpoint in lets_go() is here; ian has incorrect register location description at -O1 */ (gdb) ``` As we can see, parameter "ian" is described as having the value 1.1754909290959615e-308, but the correct value is 128966e+150. Generated debug information, collected using objdump -W: Variable description for various parameters, including 'ian': ``` <1><7e>: Abbrev Number: 3 (DW_TAG_subprogram) <7f> DW_AT_low_pc : 0xf0 <83> DW_AT_high_pc : 0xd0 <87> DW_AT_frame_base : 1 byte block: 5d (DW_OP_reg13 (r13)) <89> DW_AT_GNU_all_call_sites: 1 <89> DW_AT_name : (indirect string, offset: 0x1c4): lets_go <8d> DW_AT_decl_file : 1 <8e> DW_AT_decl_line : 11 <8f> DW_AT_prototyped : 1 <8f> DW_AT_type : <0xf9> <93> DW_AT_external : 1 <2><93>: Abbrev Number: 5 (DW_TAG_formal_parameter) <94> DW_AT_location : 0x91 (location list) <98> DW_AT_name : (indirect string, offset: 0x276): phil <9c> DW_AT_decl_file : 1 <9d> DW_AT_decl_line : 11 <9e> DW_AT_type : <0xf9> <2><a2>: Abbrev Number: 5 (DW_TAG_formal_parameter) <a3> DW_AT_location : 0xc7 (location list) <a7> DW_AT_name : (indirect string, offset: 0x21b): grant <ab> DW_AT_decl_file : 1 <ac> DW_AT_decl_line : 11 <ad> DW_AT_type : <0xf9> <2><b1>: Abbrev Number: 5 (DW_TAG_formal_parameter) <b2> DW_AT_location : 0xee (location list) <b6> DW_AT_name : (indirect string, offset: 0xfc): dot <ba> DW_AT_decl_file : 1 <bb> DW_AT_decl_line : 11 <bc> DW_AT_type : <0xf9> <2><c0>: Abbrev Number: 6 (DW_TAG_formal_parameter) <c1> DW_AT_name : (indirect string, offset: 0x397): ethel <c5> DW_AT_decl_file : 1 <c6> DW_AT_decl_line : 11 <c7> DW_AT_type : <0xf9> <2><cb>: Abbrev Number: 5 (DW_TAG_formal_parameter) <cc> DW_AT_location : 0x0 (location list) <d0> DW_AT_name : (indirect string, offset: 0x5af): ian <d4> DW_AT_decl_file : 1 <d5> DW_AT_decl_line : 11 <d6> DW_AT_type : <0xf9> ``` Location list info for parameter "ian": ``` Contents of the .debug_loc section: Offset Begin End Expression 00000000 000000f8 000000fe (DW_OP_piece: 4; DW_OP_reg3 (r3); DW_OP_piece: 4) 0000000f 000000fe 00000104 (DW_OP_piece: 4; DW_OP_reg4 (r4); DW_OP_piece: 4) 0000001e 00000104 0000010c (DW_OP_reg2 (r2); DW_OP_piece: 4; DW_OP_reg4 (r4); DW_OP_piece: 4) 0000002e 0000010c 0000011c (DW_OP_piece: 4; DW_OP_reg4 (r4); DW_OP_piece: 4) 0000003d 0000011c 00000120 (DW_OP_reg2 (r2); DW_OP_piece: 4; DW_OP_reg4 (r4); DW_OP_piece: 4) 0000004d 00000120 00000128 (DW_OP_reg2 (r2); DW_OP_piece: 4; DW_OP_reg3 (r3); DW_OP_piece: 4) 0000005d <End of list> ``` Relevant location at breakpoint (PC 0x106): ``` 0000001e 00000104 0000010c (DW_OP_reg2 (r2); DW_OP_piece: 4; DW_OP_reg4 (r4); DW_OP_piece: 4) ``` This incorrectly states that 'ian' is located in registers r2 and r4. The generated assembly for the beginning of lets_go (harvested from -save-temps) states: ``` lets_go: .Lfunc_begin1: .loc 1 12 0 is_stmt 1 @ repro.c:12:0 .fnstart .cfi_startproc @ %bb.0: .save {r4, r5, r6, r7, r8, r9, r10, r11, lr} push.w {r4, r5, r6, r7, r8, r9, r10, r11, lr} .cfi_def_cfa_offset 36 .cfi_offset lr, -4 .cfi_offset r11, -8 .cfi_offset r10, -12 .cfi_offset r9, -16 .cfi_offset r8, -20 .cfi_offset r7, -24 .cfi_offset r6, -28 .cfi_offset r5, -32 .cfi_offset r4, -36 .pad #4 sub sp, #4 .cfi_def_cfa_offset 40 mov r10, r1 .Ltmp5: @DEBUG_VALUE: lets_go:ian <- [DW_OP_LLVM_fragment 32 32] $r3 ldrd r7, r1, [sp, #48] .Ltmp6: @DEBUG_VALUE: lets_go:sharon <- 5.990000e-200 @DEBUG_VALUE: lets_go:ethel <- undef mov r4, r3 .Ltmp7: @DEBUG_VALUE: lets_go:ian <- [DW_OP_LLVM_fragment 32 32] $r4 mov r5, r2 ldrd r2, r3, [sp, #56] @DEBUG_VALUE: lets_go:phil <- [DW_OP_LLVM_fragment 0 32] undef .Ltmp8: @DEBUG_VALUE: lets_go:phil <- [DW_OP_LLVM_fragment 32 32] $r10 @DEBUG_VALUE: lets_go:grant <- [DW_OP_LLVM_fragment 0 32] $r5 @DEBUG_VALUE: lets_go:grant <- [DW_OP_LLVM_fragment 32 32] $r4 @DEBUG_VALUE: lets_go:dot <- [DW_OP_LLVM_fragment 0 32] undef @DEBUG_VALUE: lets_go:dot <- [DW_OP_LLVM_fragment 32 32] undef @DEBUG_VALUE: lets_go:ian <- [DW_OP_LLVM_fragment 0 32] $r2 mov r11, r0 .Ltmp9: .loc 1 14 24 prologue_end @ repro.c:14:24 mov r0, r7 ``` Corresponding disassembly up to the breakpoint site (PC 0x106): ``` 000000f0 <lets_go>: f0: e92d 4ff0 stmdb sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr} f4: b081 sub sp, #4 f6: 468a mov sl, r1 f8: e9dd 710c ldrd r7, r1, [sp, #48] ; 0x30 fc: 461c mov r4, r3 fe: 4615 mov r5, r2 100: e9dd 230e ldrd r2, r3, [sp, #56] ; 0x38 104: 4683 mov fp, r0 106: 4638 mov r0, r7 ``` Please let me know if any other information would be useful to reproduce this.
EugeneZelenko commented 2 months ago

@Rick-Brewin: Could you please try 19 or main branch? https://godbolt.org should be helpful.

Rick-Brewin commented 2 months ago

Hi @EugeneZelenko,

Thanks for your reply.

I tried this using compiler "armv7-a clang (trunk)" on https://godbolt.org

Command line switches set to: -mcpu=cortex-m4 -mthumb -target arm-none-eabi -fshort-enums -fsigned-char -gdwarf-4 -fomit-frame-pointer -g -O1 -mfloat-abi=soft

Under menu "Output", I enabled "Compile to binary object".

Enabled additional tool "llvm-objdump (trunk)" so I could inspect debug output.

Here's the disassembly produced by the above:

    lets_go:
6c   stmdb  sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
70   sub    sp, #4
72   mov    sl, r1
74   ldrd   r7, r1, [sp, #48]   ; 0x30
78   mov    r4, r3
7a   mov    r5, r2
7c   ldrd   r2, r3, [sp, #56]   ; 0x38
80   mov    fp, r0
82   mov    r0, r7
84   bl 0 <__aeabi_dadd>
        R_ARM_THM_CALL __aeabi_dadd

And the debug info for parameter "ian":

  DW_TAG_formal_parameter
                  DW_AT_location    (0x00000000: 
                     [0x00000074, 0x0000007a): DW_OP_piece 0x4, DW_OP_reg3 R3, DW_OP_piece 0x4
                     [0x0000007a, 0x00000080): DW_OP_piece 0x4, DW_OP_reg4 R4, DW_OP_piece 0x4
                     [0x00000080, 0x00000088): DW_OP_reg2 R2, DW_OP_piece 0x4, DW_OP_reg4 R4, DW_OP_piece 0x4
                     [0x00000088, 0x00000098): DW_OP_piece 0x4, DW_OP_reg4 R4, DW_OP_piece 0x4
                     [0x00000098, 0x0000009c): DW_OP_reg2 R2, DW_OP_piece 0x4, DW_OP_reg4 R4, DW_OP_piece 0x4
                     [0x0000009c, 0x000000a4): DW_OP_reg2 R2, DW_OP_piece 0x4, DW_OP_reg3 R3, DW_OP_piece 0x4)
                  DW_AT_name    ("ian")
                  DW_AT_decl_file   ("/app/example.c")
                  DW_AT_decl_line   (11)
                  DW_AT_type    (0x000000f9 "double")

The code and generated debug info look the same as that produced by clang 18.1.8 (notwithstanding the cosmetically different disassembly and debug presentations; my original report used GNU objdump to obtain the disassembly and debug info dumps).

That is, the generated assembly looks the same, and the debug info still describes parameter "ian" as being fully contained within registers R2 and R4 at PC range 0x80 through 0x88, which is incorrect.

jmorse commented 2 months ago

I've little familiarity with ARM, but there are some interesting things going on with this code:

After virtregrewriter we grow two extra DBG_VALUEs that seem to refer to the wrong values. Here's some MIR before LiveDebugVariables:

      %31:gpr = t2LDRi12 %fixed-stack.0, 0, 14, $noreg :: (load (s32) from %fixed-stack.0)
      DBG_VALUE %31:gpr, $noreg, !"ian", !DIExpression(DW_OP_LLVM_fragment, 32, 32), debug-location !33;
      %30:gpr = t2LDRi12 %fixed-stack.1, 0, 14, $noreg :: (load (s32) from %fixed-stack.1, align 8)
      DBG_VALUE %30:gpr, $noreg, !"ian", !DIExpression(DW_OP_LLVM_fragment, 0, 32), debug-location !33;

Which specifies the (correct AFAIUI) location of "ian", which has no other definitions. After register allocation we get (edited for conciseness):

      renamable $r3 = t2LDRi12 %fixed-stack.0, 0, 14, $noreg :: (load (s32) from %fixed-stack.0)
      DBG_VALUE $r3, $noreg, !"ian", !DIExpression(DW_OP_LLVM_fragment, 32, 32), debug-location !33;
      renamable $r2 = t2LDRi12 %fixed-stack.1, 0, 14, $noreg :: (load (s32) from %fixed-stack.1, align 8)
      DBG_VALUE $noreg, $noreg, !"phil", !DIExpression(DW_OP_LLVM_fragment, 0, 32), debug-location !33;
      renamable $r7 = t2LDRi12 %fixed-stack.3, 0, 14, $noreg :: (load (s32) from %fixed-stack.3, align 8)
      renamable $r1 = t2LDRi12 %fixed-stack.2, 0, 14, $noreg :: (load (s32) from %fixed-stack.2)
      renamable $r11 = COPY $r0
      ADJCALLSTACKDOWN 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp, debug-location !34;
      $r0 = COPY killed renamable $r7, debug-location !34;
      tBL 14, $noreg, &__aeabi_dadd, <regmask $lr $d8 $d9 $d10 $d11 $d12 $d13 $d14 $d15 $q4 $q5 $q6 $q7 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $s16 $s17 $s18 $s19 $s20 $s21 $s22 $s23 $s24 $s25 $s26 $s27 and 35 more...>, implicit-def dead $lr, implicit $sp, implicit $r0, implicit $r1, implicit $r2, implicit $r3, implicit-def $sp, implicit-def $r0, implicit-def $r1, debug-location !34; /tmp/nou.c:14:24
      ADJCALLSTACKUP 0, -1, 14, $noreg, implicit-def dead $sp, implicit $sp, debug-location !34;                                                                                     
      renamable $r6 = COPY $r0, debug-location !34;
      renamable $r7 = COPY $r1, debug-location !34;
      ADJCALLSTACKDOWN 0, 0, 14, $noreg, implicit-def dead $sp, implicit $sp, debug-location !35;
      renamable $r2 = t2MOVi32imm -350469331
      DBG_VALUE $r2, $noreg, !"ian", !DIExpression(DW_OP_LLVM_fragment, 0, 32), debug-location !33;
      renamable $r3 = t2MOVi32imm 1058682594
      DBG_VALUE $r3, $noreg, !"ian", !DIExpression(DW_OP_LLVM_fragment, 32, 32), debug-location !33;

The last two DBG_VALUEs are new, and seem to refer to constant values, and are probably incorrect. I suspect this is the "register allocator doesn't know if it merges a non-live variable into another value" issue.

Problem two, here's the MIR before the "arm-ldst-opt" pass:

renamable $r3 = t2LDRi12 $sp, 60, 14, $noreg :: (load (s32) from %fixed-stack.0)
DBG_VALUE $r3, $noreg, !"ian", !DIExpression(DW_OP_LLVM_fragment, 32, 32), debug-location !33;
renamable $r2 = t2LDRi12 $sp, 56, 14, $noreg :: (load (s32) from %fixed-stack.1, align 8)
[...]
DBG_VALUE $r2, $noreg, !"ian", !DIExpression(DW_OP_LLVM_fragment, 0, 32), debug-location !33; 

After:

DBG_VALUE $r3, $noreg, !"ian", !DIExpression(DW_OP_LLVM_fragment, 32, 32), debug-location !33;
$r2, $r3 = t2LDRDi8 $sp, 56, 14, $noreg :: (load (s32) from %fixed-stack.1, align 8), (load (s32) from %fixed-stack.0)
DBG_VALUE $r2, $noreg, !"ian", !DIExpression(DW_OP_LLVM_fragment, 0, 32), debug-location !33; 

The coalescing of loads into t2LDRDi8 causes the DBG_VALUE for r3 to change what it's referring to: now it's referring to whatever was in r3 before (a parameter?). The natural interpretation of this is that part of "ian" is wrong, then terminated by the t2LDRDi8...

...however then the postmisched instruction scheduler shuffles everything around even further, and the "wrong" references to r3 refers to an even longer range of instructions

~

The most immediate fix would be ensuring arm-ldst-opt doesn't cause DBG_VALUEs to refer to different register values; however given that it's consolidating two value definitions into one location there could be use-cases where that's very difficult. If the two extra DBG_VALUEs really are because of regalloc merging a live-value into an out-of-liveness-variable-value, that's also super complicated to solve. The instruction schedulers re-ordering assignments is merely "hard" to solve.

NB: the first two matters would be solved by using the "instruction referencing" debug-info representation we've implemented for x86, presented at https://www.youtube.com/watch?v=yxuwfNnp064 . The summary is that we can treat DBG_* instructions as "uses" of values, label certain instructions as definitions, and then defer all of the hard problems until the end of compilation where we solve some well defined SSA-construction tasks. arm-ldst-opt would then only need to record the movement of a value-definition from one instruction to another.

jmorse commented 2 months ago

(Also, thank you for the very detailed report, it makes everything clearer!)

Rick-Brewin commented 2 months ago

@jmorse, thanks for your detailed analysis and suggestions for how this could be fixed.

Is it possible to get this issue labelled with "arm-backend" in the hope that an expert in the ARM optimisation passes might take an interest?

Even if it is too difficult to map to a real location as a result of destructive optimisations, simply detecting that tracking the variable is too hard and outputting an empty location list would be preferable to the current outcome.

llvmbot commented 2 months ago

@llvm/issue-subscribers-backend-arm

Author: Rick (Rick-Brewin)

Self-contained reproduction C source code attached as a text file: [repro.c.txt](https://github.com/user-attachments/files/17188039/repro.c.txt) Building the above source with the following compiler options: `-mcpu=cortex-m4 -mthumb -target arm-none-eabi -fshort-enums -fsigned-char -gdwarf-4 -fomit-frame-pointer -g -O1 -mfloat-abi=soft -save-temps` .... generates an incorrect register pair (R2,R4) in the debug info describing the location of parameter "ian" in function "lets_go()". While the debug experience at optimisation levels higher than -O0 can be fragmented, the compiler should prefer to output no location info at all instead of misleading location info. The debugger (in this case GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.2) 9.2) has no way to detect that the location description is inaccurate. GDB output, connecting to qemu-system-gnuarmeclipse 2.8.0: ``` Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from repro.elf... (gdb) b lets_go Breakpoint 1 at 0x106: file repro.c, line 14. (gdb) target remote localhost:12345 Remote debugging using localhost:12345 main () at repro.c:4 4 { (gdb) c Continuing. Breakpoint 1, lets_go (phil=<optimized out>, grant=1.1754920000000001e-308, dot=<optimized out>, ethel=<optimized out>, ian=1.1754909290959615e-308) at repro.c:14 14 double pat = ethel + ian; /* first natural breakpoint in lets_go() is here; ian has incorrect register location description at -O1 */ (gdb) ``` As we can see, parameter "ian" is described as having the value 1.1754909290959615e-308, but the correct value is 128966e+150. Generated debug information, collected using objdump -W: Variable description for various parameters, including 'ian': ``` <1><7e>: Abbrev Number: 3 (DW_TAG_subprogram) <7f> DW_AT_low_pc : 0xf0 <83> DW_AT_high_pc : 0xd0 <87> DW_AT_frame_base : 1 byte block: 5d (DW_OP_reg13 (r13)) <89> DW_AT_GNU_all_call_sites: 1 <89> DW_AT_name : (indirect string, offset: 0x1c4): lets_go <8d> DW_AT_decl_file : 1 <8e> DW_AT_decl_line : 11 <8f> DW_AT_prototyped : 1 <8f> DW_AT_type : <0xf9> <93> DW_AT_external : 1 <2><93>: Abbrev Number: 5 (DW_TAG_formal_parameter) <94> DW_AT_location : 0x91 (location list) <98> DW_AT_name : (indirect string, offset: 0x276): phil <9c> DW_AT_decl_file : 1 <9d> DW_AT_decl_line : 11 <9e> DW_AT_type : <0xf9> <2><a2>: Abbrev Number: 5 (DW_TAG_formal_parameter) <a3> DW_AT_location : 0xc7 (location list) <a7> DW_AT_name : (indirect string, offset: 0x21b): grant <ab> DW_AT_decl_file : 1 <ac> DW_AT_decl_line : 11 <ad> DW_AT_type : <0xf9> <2><b1>: Abbrev Number: 5 (DW_TAG_formal_parameter) <b2> DW_AT_location : 0xee (location list) <b6> DW_AT_name : (indirect string, offset: 0xfc): dot <ba> DW_AT_decl_file : 1 <bb> DW_AT_decl_line : 11 <bc> DW_AT_type : <0xf9> <2><c0>: Abbrev Number: 6 (DW_TAG_formal_parameter) <c1> DW_AT_name : (indirect string, offset: 0x397): ethel <c5> DW_AT_decl_file : 1 <c6> DW_AT_decl_line : 11 <c7> DW_AT_type : <0xf9> <2><cb>: Abbrev Number: 5 (DW_TAG_formal_parameter) <cc> DW_AT_location : 0x0 (location list) <d0> DW_AT_name : (indirect string, offset: 0x5af): ian <d4> DW_AT_decl_file : 1 <d5> DW_AT_decl_line : 11 <d6> DW_AT_type : <0xf9> ``` Location list info for parameter "ian": ``` Contents of the .debug_loc section: Offset Begin End Expression 00000000 000000f8 000000fe (DW_OP_piece: 4; DW_OP_reg3 (r3); DW_OP_piece: 4) 0000000f 000000fe 00000104 (DW_OP_piece: 4; DW_OP_reg4 (r4); DW_OP_piece: 4) 0000001e 00000104 0000010c (DW_OP_reg2 (r2); DW_OP_piece: 4; DW_OP_reg4 (r4); DW_OP_piece: 4) 0000002e 0000010c 0000011c (DW_OP_piece: 4; DW_OP_reg4 (r4); DW_OP_piece: 4) 0000003d 0000011c 00000120 (DW_OP_reg2 (r2); DW_OP_piece: 4; DW_OP_reg4 (r4); DW_OP_piece: 4) 0000004d 00000120 00000128 (DW_OP_reg2 (r2); DW_OP_piece: 4; DW_OP_reg3 (r3); DW_OP_piece: 4) 0000005d <End of list> ``` Relevant location at breakpoint (PC 0x106): ``` 0000001e 00000104 0000010c (DW_OP_reg2 (r2); DW_OP_piece: 4; DW_OP_reg4 (r4); DW_OP_piece: 4) ``` This incorrectly states that 'ian' is located in registers r2 and r4. The generated assembly for the beginning of lets_go (harvested from -save-temps) states: ``` lets_go: .Lfunc_begin1: .loc 1 12 0 is_stmt 1 @ repro.c:12:0 .fnstart .cfi_startproc @ %bb.0: .save {r4, r5, r6, r7, r8, r9, r10, r11, lr} push.w {r4, r5, r6, r7, r8, r9, r10, r11, lr} .cfi_def_cfa_offset 36 .cfi_offset lr, -4 .cfi_offset r11, -8 .cfi_offset r10, -12 .cfi_offset r9, -16 .cfi_offset r8, -20 .cfi_offset r7, -24 .cfi_offset r6, -28 .cfi_offset r5, -32 .cfi_offset r4, -36 .pad #4 sub sp, #4 .cfi_def_cfa_offset 40 mov r10, r1 .Ltmp5: @DEBUG_VALUE: lets_go:ian <- [DW_OP_LLVM_fragment 32 32] $r3 ldrd r7, r1, [sp, #48] .Ltmp6: @DEBUG_VALUE: lets_go:sharon <- 5.990000e-200 @DEBUG_VALUE: lets_go:ethel <- undef mov r4, r3 .Ltmp7: @DEBUG_VALUE: lets_go:ian <- [DW_OP_LLVM_fragment 32 32] $r4 mov r5, r2 ldrd r2, r3, [sp, #56] @DEBUG_VALUE: lets_go:phil <- [DW_OP_LLVM_fragment 0 32] undef .Ltmp8: @DEBUG_VALUE: lets_go:phil <- [DW_OP_LLVM_fragment 32 32] $r10 @DEBUG_VALUE: lets_go:grant <- [DW_OP_LLVM_fragment 0 32] $r5 @DEBUG_VALUE: lets_go:grant <- [DW_OP_LLVM_fragment 32 32] $r4 @DEBUG_VALUE: lets_go:dot <- [DW_OP_LLVM_fragment 0 32] undef @DEBUG_VALUE: lets_go:dot <- [DW_OP_LLVM_fragment 32 32] undef @DEBUG_VALUE: lets_go:ian <- [DW_OP_LLVM_fragment 0 32] $r2 mov r11, r0 .Ltmp9: .loc 1 14 24 prologue_end @ repro.c:14:24 mov r0, r7 ``` Corresponding disassembly up to the breakpoint site (PC 0x106): ``` 000000f0 <lets_go>: f0: e92d 4ff0 stmdb sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr} f4: b081 sub sp, #4 f6: 468a mov sl, r1 f8: e9dd 710c ldrd r7, r1, [sp, #48] ; 0x30 fc: 461c mov r4, r3 fe: 4615 mov r5, r2 100: e9dd 230e ldrd r2, r3, [sp, #56] ; 0x38 104: 4683 mov fp, r0 106: 4638 mov r0, r7 ``` Please let me know if any other information would be useful to reproduce this.