riscvarchive / riscv-binutils-gdb

RISC-V backports for binutils-gdb. Development is done upstream at the FSF.
GNU General Public License v2.0
148 stars 233 forks source link

Unable to unwind the stack upon signals #223

Closed lewurm closed 4 years ago

lewurm commented 4 years ago

Full reproducer: https://gist.github.com/lewurm/befb9ddf5894bad9628b1df77258598b

Consider the following program:

#include <stdio.h>
#include <stdlib.h>

#define NOINLINE __attribute__ ((noinline))

void NOINLINE abort_me(void) { abort(); /* trigger SIGABRT */ }

void NOINLINE level1(void) { abort_me(); }

void NOINLINE level2(void) { level1(); }

void NOINLINE level3(void) { level2(); }

void NOINLINE level4(void) { level3();}

int main(void) {
    level4();
    return 0;
}

Compiling and running it via:

$ riscv64-linux-gnu-gcc -march=rv64imafdc -O0 -g c.c
$ qemu-riscv64 -g 31337 ./c &
$ riscv64-unknown-linux-gnu-gdb -q -ex 'target remote localhost:31337' -ex 'b abort_me' -ex c -ex bt ./c
Reading symbols from c...
Remote debugging using localhost:31337
Reading symbols from /home/lewurm/riscv/sysroot/lib/ld-linux-riscv64-lp64d.so.1...
0x0000004000804f30 in _start () from /home/lewurm/riscv/sysroot/lib/ld-linux-riscv64-lp64d.so.1
Breakpoint 1 at 0x4000000632: file c.c, line 7.
Continuing.

Breakpoint 1, abort_me () at c.c:7
7               abort(); /* trigger SIGABRT */
#0  abort_me () at c.c:7
#1  0x0000004000000642 in level1 () at c.c:11
#2  0x0000004000000658 in level2 () at c.c:15
#3  0x000000400000066e in level3 () at c.c:19
#4  0x0000004000000684 in level4 () at c.c:23
#5  0x000000400000069a in main () at c.c:27

I get a proper backtrace, as expected.

If I let the signal trigger however, gdb is not able to unwind the stack:

(gdb) c
Continuing.

Program received signal SIGABRT, Aborted.
0x0000004000858074 in ?? ()
(gdb) bt
#0  0x0000004000858074 in ?? ()

I get the same behaviour for SIGSEGV and SIGILL (I didn't try others).

Is this a known issue or is something wrong with my setup?

Versions

$ qemu-riscv64 --version
qemu-riscv64 version 4.2.0 (Debian 1:4.2-3ubuntu6.3)
Copyright (c) 2003-2019 Fabrice Bellard and the QEMU Project developers

$ riscv64-linux-gnu-gcc --version
riscv64-linux-gnu-gcc (Ubuntu 9.3.0-10ubuntu1) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ riscv64-unknown-linux-gnu-gdb --version
GNU gdb (GDB) 9.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

GDB was built from:

$ git remote -v
origin  git@github.com:riscv/riscv-gnu-toolchain.git (fetch)
origin  git@github.com:riscv/riscv-gnu-toolchain.git (push)

$ git submodule status
 57dfc2c4d51e770ed3f617e5d1456d1e2bacf3f0 qemu (v4.0.0-1854-g57dfc2c4d5)
 d7f734bc7e9e5fb6c33b433973b57e1eed3a7e9f riscv-binutils (heads/riscv-binutils-2.34)
 4ea498a8e1fafeb568530d84db1880066478c86b riscv-dejagnu (heads/riscv-dejagnu-1.6)
 22b1bd36b05772863fd55d4056dbc739ff591942 riscv-gcc (remotes/origin/master-883-g22b1bd36b05)
 fec47beb8a1f0a6c4a6b0c548cded5711d0c27da riscv-gdb (remotes/origin/fsf-gdb-9.1-with-sim)
 7395b0964db9cc4dd544926414960e9a16842180 riscv-glibc (heads/riscv-glibc-2.29)
 f289cef6be67da67b2d97a47d6576fa7e6b4c858 riscv-newlib (newlib-3.2.0-1-gf289cef6b)
jim-wilson commented 4 years ago

It works on a hifive unleashed running Fedora rawhide.

(gdb) run Starting program: /home/jimw/tmp/a.out glibc-2.30.9000-31.fc32.riscv64 Missing separate debuginfos, use: dnf debuginfo-install Program received signal SIGABRT, Aborted. 0x000000200006ca0e in raise () from /lib64/lp64d/libc.so.6 (gdb) where

0 0x000000200006ca0e in raise () from /lib64/lp64d/libc.so.6

1 0x000000200005cfe8 in abort () from /lib64/lp64d/libc.so.6

2 0x000000000001048e in abort_me () at tmp.c:6

3 0x000000000001049a in level1 () at tmp.c:8

4 0x00000000000104b0 in level2 () at tmp.c:10

5 0x00000000000104c6 in level3 () at tmp.c:12

6 0x00000000000104dc in level4 () at tmp.c:14

7 0x00000000000104f2 in main () at tmp.c:17

(gdb)

This is probably a qemu user bug. The ucontext_t structure changed a few times before it was frozen, and qemu is probably using the wrong definition for it. This is probably the same issue reported on sw-dev in Oct 2019. https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/BUyJ_00Vvn0/m/rDNS7gAbDAAJ

I see you are using qemu-4.2 from last year. A lot of bugs in the RISC-V support have been fixed since then. You could try building your own qemu from top of tree. It is possible this has been fixed already. If not, then you could try using qemu system instead of qemu user. And/or report the bug to the qemu folks and hope someone fixes it.

lewurm commented 4 years ago

Thank you for your reply @jim-wilson!

I checked out qemu HEAD (see below for precise version), but I get the same behaviour.

$ ~/qemu/build/riscv64-linux-user/qemu-riscv64 --version
qemu-riscv64 version 5.0.91 (v5.1.0-rc1-122-g0c4fa5bc1a-dirty)

I also tried to apply the missing alignment attribute for target_sigcontext (that part of the patch didn't make it upstream apparently), but still the same behaviour.

Since it works on real hardware, I agree with your observation that this is most likely a problem with qemu instead of gdb, thus I'll report it there.

Thanks again!

lewurm commented 4 years ago

Qemu bug entry: https://bugs.launchpad.net/qemu/+bug/1889411