llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.34k stars 11.7k forks source link

SROA affects DWARF ranges causing variable not available when debugging #54796

Open cristianassaiante opened 2 years ago

cristianassaiante commented 2 years ago

In this minimized C example, the pointer variable l_47, assigned at line 6 in the inner scope {} inside the main function, is marked as not available at line 9 and becomes available with its correct value at line 10.

Apparently this behavior is caused by SROA. By debugging and analyzing the DWARF with opt-bisect-limit, after SROA there is no longer a DW_AT_location for the variable and the value goes away.

SimplifyCFG eventually adds the location, but only for addresses associated with line 10. Looking at the assembly, line 9 ends up being associated with the &a operation done both at line 6 and line 9 in the source program. Perhaps, line 9 could go with the next assembly instruction or even the movdqa operations that assign b[e]?

The issue happens when the optimization level is Os. With Og/O1/O2/O3 the variable is optimized out.

We tested clang and lldb 14.0.0 commit 116dc70 on x64.

$ cat a.c
int a,  c,  d;
int *b[4];
int main() 
  {
  int f; {
    int *l_47 = &a;
    int e = 0;
    for (; e < 4; e++)
      b[e] = &a;
    f =  c;
  }
  d = f; }

LLDB trace:

$ clang -Os -g a.c -o opt
$ lldb opt
(lldb) target create "opt"
Current executable set to '/tmp/opt' (x86_64).
(lldb) b main
Breakpoint 1: where = opt`main at a.c:9:12, address = 0x0000000000401108
(lldb) r
Process 26700 launched: '/tmp/opt' (x86_64)
Process 26700 stopped
* thread #1, name = 'opt', stop reason = breakpoint 1.1
    frame #0: 0x0000000000401108 opt`main at a.c:9:12
   6        int *l_47 = &a;
   7        int e = 0;
   8        for (; e < 4; e++)
-> 9          b[e] = &a;
   10       f =  c;
   11     }
   12     d = f; }
(lldb) frame var
(int) f = <variable not available>

(int) e = 0
(int *) l_47 = <variable not available>

(lldb) n
Process 26700 stopped
* thread #1, name = 'opt', stop reason = step over
    frame #0: 0x0000000000401127 opt`main at a.c:10:10
   7        int e = 0;
   8        for (; e < 4; e++)
   9          b[e] = &a;
-> 10       f =  c;
   11     }
   12     d = f; }
(lldb) frame var
(int) f = <variable not available>

(int) e = 0
(int *) l_47 = 0x0000000000404040

ASM at -Os:

0000000000401108 <main>:
  401108:       b8 40 40 40 00          mov    $0x404040,%eax
  40110d:       66 48 0f 6e c0          movq   %rax,%xmm0
  401112:       66 0f 70 c0 44          pshufd $0x44,%xmm0,%xmm0
  401117:       66 0f 7f 05 31 2f 00    movdqa %xmm0,0x2f31(%rip)        # 404050 <b>
  40111e:       00 
  40111f:       66 0f 7f 05 39 2f 00    movdqa %xmm0,0x2f39(%rip)        # 404060 <b+0x10>
  401126:       00 
  401127:       8b 05 43 2f 00 00       mov    0x2f43(%rip),%eax        # 404070 <c>
  40112d:       89 05 41 2f 00 00       mov    %eax,0x2f41(%rip)        # 404074 <d>
  401133:       31 c0                   xor    %eax,%eax
  401135:       c3                      ret

DWARF info at -Os:

0x000000de:       DW_TAG_variable
                    DW_AT_location      (0x00000000: 
                       [0x000000000040110d, 0x000000000040112d): DW_OP_reg0 RAX)
                    DW_AT_name  ("l_47")
                    DW_AT_decl_file     ("/tmp/a.c")
                    DW_AT_decl_line     (6)
                    DW_AT_type  (0x00000091 "int *")

The location range for l_47 does not include the instruction at 401108, which in the line number table is associated with line 9, while line 10 goes with the instruction at 401127. In turn, 401108 reads the address of variable a, which is used both to assign l_47 at line 6 and then b[e] at line 9. Perhaps line 6 should go with 401108 and line 9 with a later one?

DWARF before SROA at -Os:

0x000000d1:       DW_TAG_variable
                    DW_AT_location      (DW_OP_fbreg -8)
                    DW_AT_name  ("l_47")
                    DW_AT_decl_file     ("/tmp/a.c")
                    DW_AT_decl_line     (6)
                    DW_AT_type  (0x00000091 "int *")

DWARF after SROA at -Os:

0x000000e1:       DW_TAG_variable
                    DW_AT_name  ("l_47")
                    DW_AT_decl_file     ("/tmp/a.c")
                    DW_AT_decl_line     (6)
                    DW_AT_type  (0x00000091 "int *")

SimplifyCFG eventually adds range information.

DWARF info at -Os before SimplifyCFG:

0x000000e1:       DW_TAG_variable
                    DW_AT_name  ("l_47")
                    DW_AT_decl_file     ("/tmp/a.c")
                    DW_AT_decl_line     (6)
                    DW_AT_type  (0x00000091 "int *")

After SimplifyCFG, the DWARF is identical to the one reported above for -Os.

llvmbot commented 2 years ago

@llvm/issue-subscribers-debuginfo

dwblaikie commented 2 years ago

I'd guess an early transformation removes the read for l_47 entirely, since it's dead - then later on the read for b[e] is emitted.

I don't think it's practical/likely that l_47 will have a location in this code if it's optimized at all - even the most basic optimization would remove the read of the global.

dwblaikie commented 2 years ago

"becomes available with its correct value at line 10." - oh, that is interesting. Sorry, missed that - I'm surprised it's preserved at all. Maybe there's some chance this could be made to work, then.