riscvarchive / riscv-binutils-gdb

RISC-V backports for binutils-gdb. Development is done upstream at the FSF.
GNU General Public License v2.0
148 stars 233 forks source link

Static linked executable will segment fault on qemu after refine linker script #52

Closed kito-cheng closed 7 years ago

kito-cheng commented 7 years ago

Arch: RV32 and RV64

After this commit https://github.com/riscv/riscv-binutils-gdb/commit/6ec11ab97ab47ec4a22118e5b1c77df567796002, I can't run static link program with qemu, and it's work if revert that.

GCC: https://github.com/riscv/riscv-gcc/commit/cd5c51b0e8cabe5cb723dee35c020122d7920eb0 GLIBC: https://github.com/riscv/riscv-glibc/commit/e84d3a58c42e29cc162efa0446bb0a1e3554dde4 QEMU: https://github.com/riscv/riscv-qemu/commit/21e3a7cd78edceb4345fb6bd11e53ded3cba8517 riscv-gnu-toolchain: https://github.com/riscv/riscv-gnu-toolchain/commit/914224e0913c9ceab49ad9531a7fedc231f65c15

How to reproduce:

$ cat hello.c
#include <stdio.h>

int main(int argc, const char *argv[]){
  printf("Hello world\n");
  return 0;
}
$ riscv64-unknown-linux-gnu-gcc hello.c -o hello.rv64 -static
$ qemu-riscv64 hello.rv64
Segmentation fault (core dumped)
aswaterman commented 7 years ago

I can replicate this, but the same binary works fine on spike with Linux and spike with pk. Seems like a qemu bug.

Unfortunately, I don't know how to debug qemu.

palmer-dabbelt commented 7 years ago

I had a look at this, but I can't figure it out. I turned on some QEMU debugging options to try and figure out where the SEGV is coming from:

$ qemu-riscv64 -d cpu,exec hello.rv64 |& tail -n20
FPR28: ft8 0000000000000000 ft9 0000000000000000 ft10 0000000000000000 ft11 0000000000000000
Linking TBs 0x7f122972ac90 [000000000004bca0] index 0 -> 0x7f122972afb0 [000000000004bcb0]
Trace 0x7f122972afb0 [000000000004bcb0] brk
pc=0x000000000004bcb0
 zero 0000000000000000 ra   0000000000027014 sp   00000040008006d0 gp   00000000000950f0
 tp   0000000000000000 t0   0000000000000000 t1   000000000000002e t2   0000000000000000
 s0   0000000000095000 s1   0000000000000f60 a0   0000000000095000 a1   0000000000000010
 a2   0000000000000007 a3   0000000000000007 a4   0000000000010158 a5   0000000000095f60
 a6   0000000000000009 a7   00000000000000d6 s2   0000000000000057 s3   0000000000000020
 s4   0000000000092f50 s5   0000000000000010 s6   0000000000000000 s7   0000000000000008
 s8   0000000000000050 s9   0000000000000008 s10  0000000000000000 s11  0000000000000000
 t3   0000000000000003 t4   0000000000093000 t5   0000000000000001 t6   0000000000096000
FPR00: ft0 0000000000000000 ft1 0000000000000000 ft2 0000000000000000 ft3 0000000000000000
FPR04: ft4 0000000000000000 ft5 0000000000000000 ft6 0000000000000000 ft7 0000000000000000
FPR08: fs0 0000000000000000 fs1 0000000000000000 fa0 0000000000000000 fa1 0000000000000000
FPR12: fa2 0000000000000000 fa3 0000000000000000 fa4 0000000000000000 fa5 0000000000000000
FPR16: fa6 0000000000000000 fa7 0000000000000000 fs2 0000000000000000 fs3 0000000000000000
FPR20: fs4 0000000000000000 fs5 0000000000000000 fs6 0000000000000000 fs7 0000000000000000
FPR24: fs8 0000000000000000 fs9 0000000000000000 fs10 0000000000000000 fs11 0000000000000000
FPR28: ft8 0000000000000000 ft9 0000000000000000 ft10 0000000000000000 ft11 0000000000000000

I think this is a SEGV that is being reflected from the RISC-V code (ie, not an internal one inside the QEMU host code), since I see this

$ gdb --args qemu-riscv64 hello.rv64
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from qemu-riscv64...done.
(gdb) r
Starting program: /scratch/palmer.dabbelt/work/upstream/riscv-gnu-toolchain-regressions/install/riscv-qemu/bin/qemu-riscv64 hello.rv64
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff6da2700 (LWP 89947)]
[New Thread 0x7ffff65a1700 (LWP 89950)]

Program received signal SIGSEGV, Segmentation fault.
0x00005555559691fd in static_code_gen_buffer ()
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7017fb2 in do_sigsuspend (set=0x7fffffffdd18) at ../sysdeps/unix/sysv/linux/sigsuspend.c:31
31      ../sysdeps/unix/sysv/linux/sigsuspend.c: No such file or directory.
(gdb) bt
#0  0x00007ffff7017fb2 in do_sigsuspend (set=0x7fffffffdd18) at ../sysdeps/unix/sysv/linux/sigsuspend.c:31
#1  __GI___sigsuspend (set=0x7fffffffdd18) at ../sysdeps/unix/sysv/linux/sigsuspend.c:45
#2  0x00005555555e1c45 in dump_core_and_abort (target_sig=11) at /scratch/palmer.dabbelt/work/upstream/riscv-gnu-toolchain-regressions/riscv-qemu/linux-user/signal.c:602
#3  0x00005555555e2e0e in handle_pending_signal (cpu_env=0x5555579bd9f0, sig=11, k=0x5555579bf4e0) at /scratch/palmer.dabbelt/work/upstream/riscv-gnu-toolchain-regressions/riscv-qemu/linux-user/signal.c:6147
#4  0x00005555555e30de in process_pending_signals (cpu_env=0x5555579bd9f0) at /scratch/palmer.dabbelt/work/upstream/riscv-gnu-toolchain-regressions/riscv-qemu/linux-user/signal.c:6229
#5  0x00005555555bf359 in cpu_loop (env=0x5555579bd9f0) at /scratch/palmer.dabbelt/work/upstream/riscv-gnu-toolchain-regressions/riscv-qemu/linux-user/main.c:3873
#6  0x00005555555c0b95 in main (argc=2, argv=0x7fffffffe808, envp=0x7fffffffe820) at /scratch/palmer.dabbelt/work/upstream/riscv-gnu-toolchain-regressions/riscv-qemu/linux-user/main.c:4918

I went ahead and tried to figure out where the problem was coming from, and it appears to be from a store to __libc_errno:

$ gdb --batch --ex r --ex bt --ex 'x/i $pc' --ex 'print $rbp' --args qemu-riscv64 -singlestep -d in_asm,out_asm,exec,cpu hello.rv64 |& tail -n60
 tp   0000000000000000 t0   0000000000000000 t1   000000000000002e t2   0000000000000000
 s0   0000000000095000 s1   0000000000000f60 a0   0000000000095000 a1   0000000000000010
 a2   0000000000000007 a3   0000000000000007 a4   0000000000010158 a5   0000000000095f60
 a6   0000000000000009 a7   00000000000000d6 s2   0000000000000057 s3   0000000000000020
 s4   0000000000092f50 s5   0000000000000010 s6   0000000000000000 s7   0000000000000008
 s8   0000000000000050 s9   0000000000000008 s10  0000000000000000 s11  0000000000000000
 t3   0000000000000003 t4   0000000000093000 t5   0000000000000001 t6   0000000000096000
FPR00: ft0 0000000000000000 ft1 0000000000000000 ft2 0000000000000000 ft3 0000000000000000
FPR04: ft4 0000000000000000 ft5 0000000000000000 ft6 0000000000000000 ft7 0000000000000000
FPR08: fs0 0000000000000000 fs1 0000000000000000 fa0 0000000000000000 fa1 0000000000000000
FPR12: fa2 0000000000000000 fa3 0000000000000000 fa4 0000000000000000 fa5 0000000000000000
FPR16: fa6 0000000000000000 fa7 0000000000000000 fs2 0000000000000000 fs3 0000000000000000
FPR20: fs4 0000000000000000 fs5 0000000000000000 fs6 0000000000000000 fs7 0000000000000000
FPR24: fs8 0000000000000000 fs9 0000000000000000 fs10 0000000000000000 fs11 0000000000000000
FPR28: ft8 0000000000000000 ft9 0000000000000000 ft10 0000000000000000 ft11 0000000000000000
IN:
0x000000000004bcb4: DASM(0x02e22023)
OUT: [size=60]
0x55555596d050:  mov    -0x8(%r14),%ebp
0x55555596d054:  test   %ebp,%ebp
0x55555596d056:  jne    0x55555596d07d
0x55555596d05c:  mov    0x20(%r14),%rbp
0x55555596d060:  add    $0x20,%rbp
0x55555596d064:  mov    0x70(%r14),%rbx
0x55555596d068:  mov    %ebx,0x0(%rbp)
0x55555596d06b:  movq   $0x4bcb8,0x200(%r14)
0x55555596d076:  xor    %eax,%eax
0x55555596d078:  jmpq   0x5555559660f6
0x55555596d07d:  mov    $0x7ffff4412eeb,%rax
0x55555596d087:  jmpq   0x5555559660f6

Trace 0x55555596d050 [000000000004bcb4] brk
pc=0x000000000004bcb4
 zero 0000000000000000 ra   0000000000027014 sp   0000004000800760 gp   00000000000950f0
 tp   0000000000000000 t0   0000000000000000 t1   000000000000002e t2   0000000000000000
 s0   0000000000095000 s1   0000000000000f60 a0   0000000000095000 a1   0000000000000010
 a2   0000000000000007 a3   0000000000000007 a4   000000000000000c a5   0000000000095f60
 a6   0000000000000009 a7   00000000000000d6 s2   0000000000000057 s3   0000000000000020
 s4   0000000000092f50 s5   0000000000000010 s6   0000000000000000 s7   0000000000000008
 s8   0000000000000050 s9   0000000000000008 s10  0000000000000000 s11  0000000000000000
 t3   0000000000000003 t4   0000000000093000 t5   0000000000000001 t6   0000000000096000
FPR00: ft0 0000000000000000 ft1 0000000000000000 ft2 0000000000000000 ft3 0000000000000000
FPR04: ft4 0000000000000000 ft5 0000000000000000 ft6 0000000000000000 ft7 0000000000000000
FPR08: fs0 0000000000000000 fs1 0000000000000000 fa0 0000000000000000 fa1 0000000000000000
FPR12: fa2 0000000000000000 fa3 0000000000000000 fa4 0000000000000000 fa5 0000000000000000
FPR16: fa6 0000000000000000 fa7 0000000000000000 fs2 0000000000000000 fs3 0000000000000000
FPR20: fs4 0000000000000000 fs5 0000000000000000 fs6 0000000000000000 fs7 0000000000000000
FPR24: fs8 0000000000000000 fs9 0000000000000000 fs10 0000000000000000 fs11 0000000000000000
FPR28: ft8 0000000000000000 ft9 0000000000000000 ft10 0000000000000000 ft11 0000000000000000

Program received signal SIGSEGV, Segmentation fault.
0x000055555596d068 in static_code_gen_buffer ()
#0  0x000055555596d068 in static_code_gen_buffer ()
#1  0x000055555558cbe0 in cpu_tb_exec (cpu=0x5555579b5ba0, itb=0x7ffff4412ee8) at /scratch/palmer.dabbelt/work/upstream/riscv-gnu-toolchain-regressions/riscv-qemu/cpu-exec.c:167
#2  0x000055555558d5ec in cpu_loop_exec_tb (cpu=0x5555579b5ba0, tb=0x7ffff4412ee8, last_tb=0x7fffffffdef8, tb_exit=0x7fffffffdef4, sc=0x7fffffffdf10) at /scratch/palmer.dabbelt/work/upstream/riscv-gnu-toolchain-regressions/riscv-qemu/cpu-exec.c:518
#3  0x000055555558d809 in cpu_exec (cpu=0x5555579b5ba0) at /scratch/palmer.dabbelt/work/upstream/riscv-gnu-toolchain-regressions/riscv-qemu/cpu-exec.c:613
#4  0x00005555555bef21 in cpu_loop (env=0x5555579bde20) at /scratch/palmer.dabbelt/work/upstream/riscv-gnu-toolchain-regressions/riscv-qemu/linux-user/main.c:3781
#5  0x00005555555c0b95 in main (argc=5, argv=0x7fffffffe7c8, envp=0x7fffffffe7f8) at /scratch/palmer.dabbelt/work/upstream/riscv-gnu-toolchain-regressions/riscv-qemu/linux-user/main.c:4918
=> 0x55555596d068 <static_code_gen_buffer+28552>:       mov    %ebx,0x0(%rbp)
$1 = (void *) 0x20
$ riscv64-unknown-linux-gnu-objdump -d hello.rv64 | grep 4bcb4
   4bcb4:       02e22023                sw      a4,32(tp) # 20 <__libc_errno>

but the store looks like it's going to the correct location (or at least, the GDB trace matches what objdump says). Looking at the RISC-V QEMU implementation, the maps don't have this address

$ qemu-riscv64 -d page hello.rv64                                
host mmap_min_addr=0x10000
Reserved 0x87000 bytes of guest address space
Relocating guest address space from 0x0000000000010000 to 0x10000
guest_base  0x0
start            end              size             prot
0000000000010000-0000000000092000 0000000000082000 r-x
0000000000092000-0000000000097000 0000000000005000 rw-
0000004000000000-0000004000001000 0000000000001000 ---
0000004000001000-0000004000801000 0000000000800000 rw-
start_brk   0x0000000000000000
end_code    0x0000000000091386
start_code  0x0000000000010000
start_data  0x0000000000092f50
end_data    0x0000000000094b58
start_stack 0x0000004000800890
brk         0x0000000000094b58
entry       0x00000000000102f4
Segmentation fault (core dumped)

@aswaterman Is tp supposed to be zero here, or is something supposed to have initialized it earlier? Maybe QEMU isn't initializing tp in user mode but it wasn't noticed before because all our pages were just RWX?

aswaterman commented 7 years ago

Your hypothesis sounds spot-on @palmer-dabbelt.

In pk, we had to do a bunch of stuff to get glibc to do the right thing for statically linked programs. Basically, they need to be told where their ELF headers are mapped, so they can figure out how to set up static TLS (which is what causes tp to be initialized). See https://github.com/riscv/riscv-pk/blob/master/pk/pk.c#L92

Maybe @sorear or @sagark knows how to fix this in qemu?

aswaterman commented 7 years ago

@sorear figured out that it is a bug in riscv-qemu: https://github.com/riscv/riscv-qemu/pull/48

I'm re-running the test suite, but I did verify that static hello world works, so I'm going to tentatively close this issue.