Closed kito-cheng closed 7 years ago
I can replicate this, but the same binary works fine on spike with Linux and spike with pk. Seems like a qemu bug.
Unfortunately, I don't know how to debug qemu.
I had a look at this, but I can't figure it out. I turned on some QEMU debugging options to try and figure out where the SEGV is coming from:
$ qemu-riscv64 -d cpu,exec hello.rv64 |& tail -n20
FPR28: ft8 0000000000000000 ft9 0000000000000000 ft10 0000000000000000 ft11 0000000000000000
Linking TBs 0x7f122972ac90 [000000000004bca0] index 0 -> 0x7f122972afb0 [000000000004bcb0]
Trace 0x7f122972afb0 [000000000004bcb0] brk
pc=0x000000000004bcb0
zero 0000000000000000 ra 0000000000027014 sp 00000040008006d0 gp 00000000000950f0
tp 0000000000000000 t0 0000000000000000 t1 000000000000002e t2 0000000000000000
s0 0000000000095000 s1 0000000000000f60 a0 0000000000095000 a1 0000000000000010
a2 0000000000000007 a3 0000000000000007 a4 0000000000010158 a5 0000000000095f60
a6 0000000000000009 a7 00000000000000d6 s2 0000000000000057 s3 0000000000000020
s4 0000000000092f50 s5 0000000000000010 s6 0000000000000000 s7 0000000000000008
s8 0000000000000050 s9 0000000000000008 s10 0000000000000000 s11 0000000000000000
t3 0000000000000003 t4 0000000000093000 t5 0000000000000001 t6 0000000000096000
FPR00: ft0 0000000000000000 ft1 0000000000000000 ft2 0000000000000000 ft3 0000000000000000
FPR04: ft4 0000000000000000 ft5 0000000000000000 ft6 0000000000000000 ft7 0000000000000000
FPR08: fs0 0000000000000000 fs1 0000000000000000 fa0 0000000000000000 fa1 0000000000000000
FPR12: fa2 0000000000000000 fa3 0000000000000000 fa4 0000000000000000 fa5 0000000000000000
FPR16: fa6 0000000000000000 fa7 0000000000000000 fs2 0000000000000000 fs3 0000000000000000
FPR20: fs4 0000000000000000 fs5 0000000000000000 fs6 0000000000000000 fs7 0000000000000000
FPR24: fs8 0000000000000000 fs9 0000000000000000 fs10 0000000000000000 fs11 0000000000000000
FPR28: ft8 0000000000000000 ft9 0000000000000000 ft10 0000000000000000 ft11 0000000000000000
I think this is a SEGV that is being reflected from the RISC-V code (ie, not an internal one inside the QEMU host code), since I see this
$ gdb --args qemu-riscv64 hello.rv64
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from qemu-riscv64...done.
(gdb) r
Starting program: /scratch/palmer.dabbelt/work/upstream/riscv-gnu-toolchain-regressions/install/riscv-qemu/bin/qemu-riscv64 hello.rv64
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff6da2700 (LWP 89947)]
[New Thread 0x7ffff65a1700 (LWP 89950)]
Program received signal SIGSEGV, Segmentation fault.
0x00005555559691fd in static_code_gen_buffer ()
(gdb) c
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7017fb2 in do_sigsuspend (set=0x7fffffffdd18) at ../sysdeps/unix/sysv/linux/sigsuspend.c:31
31 ../sysdeps/unix/sysv/linux/sigsuspend.c: No such file or directory.
(gdb) bt
#0 0x00007ffff7017fb2 in do_sigsuspend (set=0x7fffffffdd18) at ../sysdeps/unix/sysv/linux/sigsuspend.c:31
#1 __GI___sigsuspend (set=0x7fffffffdd18) at ../sysdeps/unix/sysv/linux/sigsuspend.c:45
#2 0x00005555555e1c45 in dump_core_and_abort (target_sig=11) at /scratch/palmer.dabbelt/work/upstream/riscv-gnu-toolchain-regressions/riscv-qemu/linux-user/signal.c:602
#3 0x00005555555e2e0e in handle_pending_signal (cpu_env=0x5555579bd9f0, sig=11, k=0x5555579bf4e0) at /scratch/palmer.dabbelt/work/upstream/riscv-gnu-toolchain-regressions/riscv-qemu/linux-user/signal.c:6147
#4 0x00005555555e30de in process_pending_signals (cpu_env=0x5555579bd9f0) at /scratch/palmer.dabbelt/work/upstream/riscv-gnu-toolchain-regressions/riscv-qemu/linux-user/signal.c:6229
#5 0x00005555555bf359 in cpu_loop (env=0x5555579bd9f0) at /scratch/palmer.dabbelt/work/upstream/riscv-gnu-toolchain-regressions/riscv-qemu/linux-user/main.c:3873
#6 0x00005555555c0b95 in main (argc=2, argv=0x7fffffffe808, envp=0x7fffffffe820) at /scratch/palmer.dabbelt/work/upstream/riscv-gnu-toolchain-regressions/riscv-qemu/linux-user/main.c:4918
I went ahead and tried to figure out where the problem was coming from, and it appears to be from a store to __libc_errno:
$ gdb --batch --ex r --ex bt --ex 'x/i $pc' --ex 'print $rbp' --args qemu-riscv64 -singlestep -d in_asm,out_asm,exec,cpu hello.rv64 |& tail -n60
tp 0000000000000000 t0 0000000000000000 t1 000000000000002e t2 0000000000000000
s0 0000000000095000 s1 0000000000000f60 a0 0000000000095000 a1 0000000000000010
a2 0000000000000007 a3 0000000000000007 a4 0000000000010158 a5 0000000000095f60
a6 0000000000000009 a7 00000000000000d6 s2 0000000000000057 s3 0000000000000020
s4 0000000000092f50 s5 0000000000000010 s6 0000000000000000 s7 0000000000000008
s8 0000000000000050 s9 0000000000000008 s10 0000000000000000 s11 0000000000000000
t3 0000000000000003 t4 0000000000093000 t5 0000000000000001 t6 0000000000096000
FPR00: ft0 0000000000000000 ft1 0000000000000000 ft2 0000000000000000 ft3 0000000000000000
FPR04: ft4 0000000000000000 ft5 0000000000000000 ft6 0000000000000000 ft7 0000000000000000
FPR08: fs0 0000000000000000 fs1 0000000000000000 fa0 0000000000000000 fa1 0000000000000000
FPR12: fa2 0000000000000000 fa3 0000000000000000 fa4 0000000000000000 fa5 0000000000000000
FPR16: fa6 0000000000000000 fa7 0000000000000000 fs2 0000000000000000 fs3 0000000000000000
FPR20: fs4 0000000000000000 fs5 0000000000000000 fs6 0000000000000000 fs7 0000000000000000
FPR24: fs8 0000000000000000 fs9 0000000000000000 fs10 0000000000000000 fs11 0000000000000000
FPR28: ft8 0000000000000000 ft9 0000000000000000 ft10 0000000000000000 ft11 0000000000000000
IN:
0x000000000004bcb4: DASM(0x02e22023)
OUT: [size=60]
0x55555596d050: mov -0x8(%r14),%ebp
0x55555596d054: test %ebp,%ebp
0x55555596d056: jne 0x55555596d07d
0x55555596d05c: mov 0x20(%r14),%rbp
0x55555596d060: add $0x20,%rbp
0x55555596d064: mov 0x70(%r14),%rbx
0x55555596d068: mov %ebx,0x0(%rbp)
0x55555596d06b: movq $0x4bcb8,0x200(%r14)
0x55555596d076: xor %eax,%eax
0x55555596d078: jmpq 0x5555559660f6
0x55555596d07d: mov $0x7ffff4412eeb,%rax
0x55555596d087: jmpq 0x5555559660f6
Trace 0x55555596d050 [000000000004bcb4] brk
pc=0x000000000004bcb4
zero 0000000000000000 ra 0000000000027014 sp 0000004000800760 gp 00000000000950f0
tp 0000000000000000 t0 0000000000000000 t1 000000000000002e t2 0000000000000000
s0 0000000000095000 s1 0000000000000f60 a0 0000000000095000 a1 0000000000000010
a2 0000000000000007 a3 0000000000000007 a4 000000000000000c a5 0000000000095f60
a6 0000000000000009 a7 00000000000000d6 s2 0000000000000057 s3 0000000000000020
s4 0000000000092f50 s5 0000000000000010 s6 0000000000000000 s7 0000000000000008
s8 0000000000000050 s9 0000000000000008 s10 0000000000000000 s11 0000000000000000
t3 0000000000000003 t4 0000000000093000 t5 0000000000000001 t6 0000000000096000
FPR00: ft0 0000000000000000 ft1 0000000000000000 ft2 0000000000000000 ft3 0000000000000000
FPR04: ft4 0000000000000000 ft5 0000000000000000 ft6 0000000000000000 ft7 0000000000000000
FPR08: fs0 0000000000000000 fs1 0000000000000000 fa0 0000000000000000 fa1 0000000000000000
FPR12: fa2 0000000000000000 fa3 0000000000000000 fa4 0000000000000000 fa5 0000000000000000
FPR16: fa6 0000000000000000 fa7 0000000000000000 fs2 0000000000000000 fs3 0000000000000000
FPR20: fs4 0000000000000000 fs5 0000000000000000 fs6 0000000000000000 fs7 0000000000000000
FPR24: fs8 0000000000000000 fs9 0000000000000000 fs10 0000000000000000 fs11 0000000000000000
FPR28: ft8 0000000000000000 ft9 0000000000000000 ft10 0000000000000000 ft11 0000000000000000
Program received signal SIGSEGV, Segmentation fault.
0x000055555596d068 in static_code_gen_buffer ()
#0 0x000055555596d068 in static_code_gen_buffer ()
#1 0x000055555558cbe0 in cpu_tb_exec (cpu=0x5555579b5ba0, itb=0x7ffff4412ee8) at /scratch/palmer.dabbelt/work/upstream/riscv-gnu-toolchain-regressions/riscv-qemu/cpu-exec.c:167
#2 0x000055555558d5ec in cpu_loop_exec_tb (cpu=0x5555579b5ba0, tb=0x7ffff4412ee8, last_tb=0x7fffffffdef8, tb_exit=0x7fffffffdef4, sc=0x7fffffffdf10) at /scratch/palmer.dabbelt/work/upstream/riscv-gnu-toolchain-regressions/riscv-qemu/cpu-exec.c:518
#3 0x000055555558d809 in cpu_exec (cpu=0x5555579b5ba0) at /scratch/palmer.dabbelt/work/upstream/riscv-gnu-toolchain-regressions/riscv-qemu/cpu-exec.c:613
#4 0x00005555555bef21 in cpu_loop (env=0x5555579bde20) at /scratch/palmer.dabbelt/work/upstream/riscv-gnu-toolchain-regressions/riscv-qemu/linux-user/main.c:3781
#5 0x00005555555c0b95 in main (argc=5, argv=0x7fffffffe7c8, envp=0x7fffffffe7f8) at /scratch/palmer.dabbelt/work/upstream/riscv-gnu-toolchain-regressions/riscv-qemu/linux-user/main.c:4918
=> 0x55555596d068 <static_code_gen_buffer+28552>: mov %ebx,0x0(%rbp)
$1 = (void *) 0x20
$ riscv64-unknown-linux-gnu-objdump -d hello.rv64 | grep 4bcb4
4bcb4: 02e22023 sw a4,32(tp) # 20 <__libc_errno>
but the store looks like it's going to the correct location (or at least, the GDB trace matches what objdump says). Looking at the RISC-V QEMU implementation, the maps don't have this address
$ qemu-riscv64 -d page hello.rv64
host mmap_min_addr=0x10000
Reserved 0x87000 bytes of guest address space
Relocating guest address space from 0x0000000000010000 to 0x10000
guest_base 0x0
start end size prot
0000000000010000-0000000000092000 0000000000082000 r-x
0000000000092000-0000000000097000 0000000000005000 rw-
0000004000000000-0000004000001000 0000000000001000 ---
0000004000001000-0000004000801000 0000000000800000 rw-
start_brk 0x0000000000000000
end_code 0x0000000000091386
start_code 0x0000000000010000
start_data 0x0000000000092f50
end_data 0x0000000000094b58
start_stack 0x0000004000800890
brk 0x0000000000094b58
entry 0x00000000000102f4
Segmentation fault (core dumped)
@aswaterman Is tp supposed to be zero here, or is something supposed to have initialized it earlier? Maybe QEMU isn't initializing tp in user mode but it wasn't noticed before because all our pages were just RWX?
Your hypothesis sounds spot-on @palmer-dabbelt.
In pk, we had to do a bunch of stuff to get glibc to do the right thing for statically linked programs. Basically, they need to be told where their ELF headers are mapped, so they can figure out how to set up static TLS (which is what causes tp to be initialized). See https://github.com/riscv/riscv-pk/blob/master/pk/pk.c#L92
Maybe @sorear or @sagark knows how to fix this in qemu?
@sorear figured out that it is a bug in riscv-qemu: https://github.com/riscv/riscv-qemu/pull/48
I'm re-running the test suite, but I did verify that static hello world works, so I'm going to tentatively close this issue.
Arch: RV32 and RV64
After this commit https://github.com/riscv/riscv-binutils-gdb/commit/6ec11ab97ab47ec4a22118e5b1c77df567796002, I can't run static link program with qemu, and it's work if revert that.
GCC: https://github.com/riscv/riscv-gcc/commit/cd5c51b0e8cabe5cb723dee35c020122d7920eb0 GLIBC: https://github.com/riscv/riscv-glibc/commit/e84d3a58c42e29cc162efa0446bb0a1e3554dde4 QEMU: https://github.com/riscv/riscv-qemu/commit/21e3a7cd78edceb4345fb6bd11e53ded3cba8517 riscv-gnu-toolchain: https://github.com/riscv/riscv-gnu-toolchain/commit/914224e0913c9ceab49ad9531a7fedc231f65c15
How to reproduce: