gdb crashes when receiving and passing SIGINT

richlowe commented 4 months ago

I don't have much details here unfortunately.

I have gdb in one terminal attached to qemu, which qemu running in another (with a disabled breakpoint). If I hit ^C in gdb to interrupt qemu, gdb segfaults, having blown its stack recursively handling SIGINT:

 --- called from signal handler with signal 2 (SIGINT) ---
 ffffffffffffffff ???????? ()
 fffffc7fed627a71 call_user_handler (2, 0, fffffc7fff49ab30) + 1d1
 fffffc7fed627d86 sigacthandler (2, 0, fffffc7fff49ab20) + f6
 --- called from signal handler with signal 2 (SIGINT) ---
 ffffffffffffffff ???????? ()
 fffffc7fed627a71 call_user_handler (2, 0, fffffc7fff49b050) + 1d1
 fffffc7fed627d86 sigacthandler (2, 0, fffffc7fff49b040) + f6
 --- called from signal handler with signal 2 (SIGINT) ---
 ffffffffffffffff ???????? ()
 fffffc7fed627a71 call_user_handler (2, 0, fffffc7fff49b570) + 1d1
 fffffc7fed627d86 sigacthandler (2, 0, fffffc7fff49b560) + f6
 --- called from signal handler with signal 2 (SIGINT) ---
 ffffffffffffffff ???????? ()
 fffffc7fed627a71 call_user_handler (2, 0, fffffc7fff49ba90) + 1d1
 fffffc7fed627d86 sigacthandler (2, 0, fffffc7fff49ba80) + f6
 --- called from signal handler with signal 2 (SIGINT) ---
 ffffffffffffffff ???????? ()
 fffffc7fed627a71 call_user_handler (2, 0, fffffc7fff49bfb0) + 1d1
 fffffc7fed627d86 sigacthandler (2, 0, fffffc7fff49bfa0) + f6
 --- called from signal handler with signal 2 (SIGINT) ---

the bottom of the stack is:

 --- called from signal handler with signal 2 (SIGINT) ---
 ffffffffffffffff ???????? ()
 0000000000a48a90 _ZL18proc_wait_for_stopP8procinfo () + 40
 0000000000a4b108 _ZN13procfs_target4waitE6ptid_tP17target_waitstatus10enum_flagsI16target_wait_flagE () + 158
 0000000000afb26d _ZN17sol_thread_target4waitE6ptid_tP17target_waitstatus10enum_flagsI16target_wait_flagE () + cd
 0000000000b824d0 _Z11target_wait6ptid_tP17target_waitstatus10enum_flagsI16target_wait_flagE () + c0
 000000000099bdac _ZL16do_target_wait_1P8inferior6ptid_tP17target_waitstatus10enum_flagsI16target_wait_flagE () + cc
 00000000009b071c _Z20fetch_inferior_eventv () + 2bc
 000000000078fef0 _Z26check_async_event_handlersv () + 40
 0000000000d29f45 _Z16gdb_do_one_eventi () + f5
 00000000009e1302 _ZL21captured_command_loopv () + 32
 00000000009e3dc5 _Z8gdb_mainP18captured_main_args () + 15
 000000000072b857 main () + 47
 000000000072b747 _start_crt () + 87
 000000000072b6a8 _start () + 18

That jump to -1 can't be good, and leaves me worried that me having added types to SIG_ERR managed to break something. It implies the flow here is that we have done:

  foo = signal(SIGINT, bar);
  ...
  signal(SIGINT, foo);

without ever having checked foo != SIG_ERR, or something like that.

richlowe commented 4 months ago

I traced this, and nobody is noticeably attaching an 0xffffff..f handler to anything. The SIGINT handler is valid as far as psig just before we crash, but I know gdb does weird things changing handlers in handlers and all of that fun.

I'm going to assume the 0xffff...ff in the trace is an artifact of the signal frames (which I thought got elided from traces completely, but I guess doesn't?) and something deeper is wrong.

Unfortunately it's just an assumption, because if I trace this it doesn't crash.

richlowe commented 4 months ago

This is actually general to gdb and signals, at list SIGINT.

You can test this quickly, at least in a bourne-like shell, by doing

gdb -p $$
...
0xfffffc7feedcbada in __read () from /usr/lib/amd64/libc.so.1
=> 0xfffffc7feedcbada <__read+10>:      73 0a                   jae    0xfffffc7feedcbae6 <__read+22>
(gdb) cont
Continuing.

If you now hit ^C gdb will segfault having taken SIGINT in a loop until the stack is blown

richlowe commented 4 months ago

Ok, apparently that might (often?) require the victim and gdb process to be in different ttys. So in one terminal

$ bash
$ echo $$
31415
$

in another

gdb -p 31415
cont
^C

omniosorg / omnios-extra

gdb crashes when receiving and passing SIGINT #1476