rr-debugger / rr

Record and Replay Framework
http://rr-project.org/
Other
9.15k stars 584 forks source link

Assertion 'stopped_or_unexpected_exit()' failed to hold. #3741

Open tysmith opened 5 months ago

tysmith commented 5 months ago

Found with Firefox debug build m-c 20240428-d1651fe33156

https://firefox-ci-tc.services.mozilla.com/tasks/index/gecko.v2.mozilla-central.revision.d1651fe33156a2aab5884ab1eaaaa8c5dd64631d.firefox/linux64-debug

This was triggered during recording using rr commit 2bb38b94.

[FATAL src/Task.cc:1117:regs()] 
 (task 10643 (rec:10643) at time 122921)
 -> Assertion `stopped_or_unexpected_exit()' failed to hold. 
Tail of trace dump:
{
  real_time:11973.908743 global_time:122901, event:`SYSCALL: futex' (state:ENTERING_SYSCALL) tid:10570, ticks:6372253
rax:0xffffffffffffffda rbx:0x70d20e1fffa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x189 rdi:0x70d21b1292c4 rbp:0x70d21b129298 rsp:0x70d20e1ffdf0 r8:0x0 r9:0xffffffff r10:0x0 r11:0x246 r12:0x0 r13:0x0 r14:0x3f r15:0x70d21b1292c4 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xca fs_base:0x70d215924640 gs_base:0x0
}
{
  real_time:11973.908746 global_time:122902, event:`SYSCALLBUF_RESET' tid:10570, ticks:6372253
}
{
  real_time:11973.908766 global_time:122903, event:`SYSCALL: futex' (state:EXITING_SYSCALL) tid:10320, ticks:1026185
rax:0x0 rbx:0x70d215220fa0 rcx:0xffffffffffffffff rdx:0x2 rsi:0x80 rdi:0x70d245d00020 rbp:0x70d2123f97a0 rsp:0x70d215220df0 r8:0x0 r9:0x59 r10:0x0 r11:0x246 r12:0x70d245d001c0 r13:0xf0 r14:0x70d245d00000 r15:0x50 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xca fs_base:0x70d2123fc640 gs_base:0x0
}
{
  real_time:11973.909351 global_time:122904, event:`SYSCALLBUF_FLUSH' tid:10320, ticks:1388141
  { syscall:'futex', ret:0x0, size:0x14 }
  { syscall:'read', ret:0xfe8, size:0xff8, desched:1 }
  { syscall:'read', ret:0xab3, size:0xac3, desched:1 }
  { syscall:'read', ret:0x0, size:0x10, desched:1 }
  { syscall:'close', ret:0x0, size:0x10 }
  { syscall:'madvise', ret:0x0, size:0x10 }
  { syscall:'madvise', ret:0x0, size:0x10 }
  { syscall:'madvise', ret:0x0, size:0x10 }
  { syscall:'madvise', ret:0x0, size:0x10 }
  { syscall:'madvise', ret:0x0, size:0x10 }
  { syscall:'madvise', ret:0x0, size:0x10 }
  { syscall:'madvise', ret:0x0, size:0x10 }
  { syscall:'madvise', ret:0x0, size:0x10 }
  { syscall:'madvise', ret:0x0, size:0x10 }
  { syscall:'madvise', ret:0x0, size:0x10 }
  { syscall:'madvise', ret:0x0, size:0x10 }
  { syscall:'madvise', ret:0x0, size:0x10 }
  { syscall:'madvise', ret:0x0, size:0x10 }
  { syscall:'madvise', ret:0x0, size:0x10 }
  { syscall:'madvise', ret:0x0, size:0x10 }
  { syscall:'madvise', ret:0x0, size:0x10 }
  { syscall:'madvise', ret:0x0, size:0x10 }
  { syscall:'madvise', ret:0x0, size:0x10 }
  { syscall:'madvise', ret:0x0, size:0x10 }
  { syscall:'madvise', ret:0x0, size:0x10 }
  { syscall:'madvise', ret:0x0, size:0x10 }
  { syscall:'madvise', ret:0x0, size:0x10 }
  { syscall:'madvise', ret:0x0, size:0x10 }
  { syscall:'madvise', ret:0x0, size:0x10 }
  { syscall:'madvise', ret:0x0, size:0x10 }
  { syscall:'madvise', ret:0x0, size:0x10 }
  { syscall:'madvise', ret:0x0, size:0x10 }
  { syscall:'madvise', ret:0x0, size:0x10 }
  { syscall:'madvise', ret:0x0, size:0x10 }
  { syscall:'madvise', ret:0x0, size:0x10 }
}
{
  real_time:11973.909401 global_time:122905, event:`PATCH_SYSCALL' tid:10320, ticks:1388141
rax:0x65 rbx:0x70d2123fa380 rcx:0xffffffffffffffff rdx:0x0 rsi:0x2991 rdi:0x10 rbp:0x70d2123fa170 rsp:0x70d2123fa100 r8:0xf r9:0x70d218016000 r10:0x0 r11:0x246 r12:0x70d201d89400 r13:0xffffffffffffffe5 r14:0x2991 r15:0x70d2123fa380 rip:0x70d245f1bfec eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xffffffffffffffff fs_base:0x70d2123fc640 gs_base:0x0
  { tid:10320, addr:0x70d24653cdf4, length:0x5e }
  { tid:10320, addr:0x70d245f1bfec, length:0x8 }
}
{
  real_time:11973.909405 global_time:122906, event:`SYSCALLBUF_RESET' tid:10320, ticks:1388141
}
{
  real_time:11973.909479 global_time:122907, event:`SYSCALL: ptrace' (state:ENTERING_SYSCALL) tid:10320, ticks:1388151
rax:0xffffffffffffffda rbx:0x70d215220fa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x2991 rdi:0x10 rbp:0x70d2123fa170 rsp:0x70d215220df0 r8:0xf r9:0x70d218016000 r10:0x0 r11:0x246 r12:0x70d201d89400 r13:0xffffffffffffffe5 r14:0x2991 r15:0x70d2123fa380 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x65 fs_base:0x70d2123fc640 gs_base:0x0
}
{
  real_time:11973.909500 global_time:122908, event:`SYSCALL: ptrace' (state:EXITING_SYSCALL) tid:10320, ticks:1388151
rax:0x0 rbx:0x70d215220fa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x2991 rdi:0x10 rbp:0x70d2123fa170 rsp:0x70d215220df0 r8:0xf r9:0x70d218016000 r10:0x0 r11:0x246 r12:0x70d201d89400 r13:0xffffffffffffffe5 r14:0x2991 r15:0x70d2123fa380 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x65 fs_base:0x70d2123fc640 gs_base:0x0
}
{
  real_time:11973.909543 global_time:122909, event:`SIGNAL: SIGCHLD(async)' tid:10320, ticks:1388151
rax:0x0 rbx:0x70d215220fa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x2991 rdi:0x10 rbp:0x70d2123fa170 rsp:0x70d215220df0 r8:0xf r9:0x70d218016000 r10:0x0 r11:0x246 r12:0x70d201d89400 r13:0xffffffffffffffe5 r14:0x2991 r15:0x70d2123fa380 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x65 fs_base:0x70d2123fc640 gs_base:0x0 st0:0x0 st1:0x0 st2:0x0 st3:0x0 st4:0x0 st5:0x0 st6:0x403ade0b6b3a76400012 st7:0x403ade0b6b3a76400000 ymm0:0x1b000070d201d89400 ymm1:0x70d2180160000000000000000400 ymm2:0x70d203a97900000000000000030f ymm3:0x7000000029910000000000001000 ymm4:0x0 ymm5:0xd01dee63164c6d9a ymm6:0x0 ymm7:0xa000070d20387a269 ymm8:0xe5e5e5e5e5e5e5e5e5e5e5e5e5e5e5e5 ymm9:0x0 ymm10:0x0 ymm11:0x0 ymm12:0x0 ymm13:0xb32e8000 ymm14:0x3f800001 ymm15:0xb1000000
}
{
  real_time:11973.909577 global_time:122910, event:`SIGNAL_DELIVERY: SIGCHLD(async)' tid:10320, ticks:1388151
rax:0x0 rbx:0x70d215220fa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x2991 rdi:0x10 rbp:0x70d2123fa170 rsp:0x70d215220df0 r8:0xf r9:0x70d218016000 r10:0x0 r11:0x246 r12:0x70d201d89400 r13:0xffffffffffffffe5 r14:0x2991 r15:0x70d2123fa380 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x65 fs_base:0x70d2123fc640 gs_base:0x0
  { tid:10320, addr:0x70d215220df0, length:(nil) }
}
{
  real_time:11973.909621 global_time:122911, event:`SYSCALL: wait4' (state:ENTERING_SYSCALL) tid:10320, ticks:1388172
rax:0xffffffffffffffda rbx:0x70d215220fa0 rcx:0xffffffffffffffff rdx:0x40000000 rsi:0x70d2123fa260 rdi:0x2991 rbp:0x70d2123fa370 rsp:0x70d215220df0 r8:0x0 r9:0x70d218016000 r10:0x0 r11:0x246 r12:0x70d201d89400 r13:0x70d23eb59e90 r14:0x2991 r15:0x70d2123fa260 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x3d fs_base:0x70d2123fc640 gs_base:0x0
}
{
  real_time:11973.909648 global_time:122912, event:`SYSCALL: wait4' (state:EXITING_SYSCALL) tid:10320, ticks:1388172
rax:0x2991 rbx:0x70d215220fa0 rcx:0xffffffffffffffff rdx:0x40000000 rsi:0x70d2123fa260 rdi:0x2991 rbp:0x70d2123fa370 rsp:0x70d215220df0 r8:0x0 r9:0x70d218016000 r10:0x0 r11:0x246 r12:0x70d201d89400 r13:0x70d23eb59e90 r14:0x2991 r15:0x70d2123fa260 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x3d fs_base:0x70d2123fc640 gs_base:0x0
  { tid:10320, addr:0x70d2123fa260, length:0x4 }
}
{
  real_time:11973.909682 global_time:122913, event:`SYSCALL: ptrace' (state:ENTERING_SYSCALL) tid:10320, ticks:1388196
rax:0xffffffffffffffda rbx:0x70d215220fa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x2991 rdi:0xc rbp:0x70d2123fa370 rsp:0x70d215220df0 r8:0xb r9:0x70d218016000 r10:0x70d2123fa188 r11:0x246 r12:0x70d201d89400 r13:0x70d23eb59e90 r14:0x2991 r15:0x70d2123fa260 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x65 fs_base:0x70d2123fc640 gs_base:0x0
}
{
  real_time:11973.909703 global_time:122914, event:`SYSCALL: ptrace' (state:EXITING_SYSCALL) tid:10320, ticks:1388196
rax:0x0 rbx:0x70d215220fa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x2991 rdi:0xc rbp:0x70d2123fa370 rsp:0x70d215220df0 r8:0xb r9:0x70d218016000 r10:0x70d2123fa188 r11:0x246 r12:0x70d201d89400 r13:0x70d23eb59e90 r14:0x2991 r15:0x70d2123fa260 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x65 fs_base:0x70d2123fc640 gs_base:0x0
  { tid:10320, addr:0x70d2123fa188, length:0xd8 }
}
{
  real_time:11973.909756 global_time:122915, event:`SYSCALL: ptrace' (state:ENTERING_SYSCALL) tid:10320, ticks:1388222
rax:0xffffffffffffffda rbx:0x70d215220fa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x2993 rdi:0x10 rbp:0x70d2123fa170 rsp:0x70d215220df0 r8:0xf r9:0x70d218016000 r10:0x0 r11:0x246 r12:0x70d201d89420 r13:0xffffffffffffffe5 r14:0x2993 r15:0x70d2123fa380 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x65 fs_base:0x70d2123fc640 gs_base:0x0
}
{
  real_time:11973.909777 global_time:122916, event:`SYSCALL: ptrace' (state:EXITING_SYSCALL) tid:10320, ticks:1388222
rax:0x0 rbx:0x70d215220fa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x2993 rdi:0x10 rbp:0x70d2123fa170 rsp:0x70d215220df0 r8:0xf r9:0x70d218016000 r10:0x0 r11:0x246 r12:0x70d201d89420 r13:0xffffffffffffffe5 r14:0x2993 r15:0x70d2123fa380 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x65 fs_base:0x70d2123fc640 gs_base:0x0
}
{
  real_time:11973.909811 global_time:122917, event:`SIGNAL: SIGCHLD(async)' tid:10320, ticks:1388222
rax:0x0 rbx:0x70d215220fa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x2993 rdi:0x10 rbp:0x70d2123fa170 rsp:0x70d215220df0 r8:0xf r9:0x70d218016000 r10:0x0 r11:0x246 r12:0x70d201d89420 r13:0xffffffffffffffe5 r14:0x2993 r15:0x70d2123fa380 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x65 fs_base:0x70d2123fc640 gs_base:0x0 st0:0x0 st1:0x0 st2:0x0 st3:0x0 st4:0x0 st5:0x0 st6:0x403ade0b6b3a76400012 st7:0x403ade0b6b3a76400000 ymm0:0x70d210fe578f000070d210fe5790 ymm1:0x70d210f03701000070d210fe5830 ymm2:0x5 ymm3:0x680000000000000000000001 ymm4:0x0 ymm5:0x2b000070d210fe5780 ymm6:0x70d2089218900000000000000000 ymm7:0x1ffffffffffffffff ymm8:0xe5e5e5e5e5e5e5e5e5e5e5e5e5e5e5e5 ymm9:0x0 ymm10:0x0 ymm11:0x0 ymm12:0x0 ymm13:0xb32e8000 ymm14:0x3f800001 ymm15:0xb1000000
}
{
  real_time:11973.909838 global_time:122918, event:`SIGNAL_DELIVERY: SIGCHLD(async)' tid:10320, ticks:1388222
rax:0x0 rbx:0x70d215220fa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x2993 rdi:0x10 rbp:0x70d2123fa170 rsp:0x70d215220df0 r8:0xf r9:0x70d218016000 r10:0x0 r11:0x246 r12:0x70d201d89420 r13:0xffffffffffffffe5 r14:0x2993 r15:0x70d2123fa380 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x65 fs_base:0x70d2123fc640 gs_base:0x0
  { tid:10320, addr:0x70d215220df0, length:(nil) }
}
{
  real_time:11973.909874 global_time:122919, event:`SYSCALL: wait4' (state:ENTERING_SYSCALL) tid:10320, ticks:1388243
rax:0xffffffffffffffda rbx:0x70d215220fa0 rcx:0xffffffffffffffff rdx:0x40000000 rsi:0x70d2123fa260 rdi:0x2993 rbp:0x70d2123fa370 rsp:0x70d215220df0 r8:0x0 r9:0x70d218016000 r10:0x0 r11:0x246 r12:0x70d201d89420 r13:0x70d23eb59e90 r14:0x2993 r15:0x70d2123fa260 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x3d fs_base:0x70d2123fc640 gs_base:0x0
}
{
  real_time:11973.909898 global_time:122920, event:`SYSCALL: wait4' (state:EXITING_SYSCALL) tid:10320, ticks:1388243
rax:0x2993 rbx:0x70d215220fa0 rcx:0xffffffffffffffff rdx:0x40000000 rsi:0x70d2123fa260 rdi:0x2993 rbp:0x70d2123fa370 rsp:0x70d215220df0 r8:0x0 r9:0x70d218016000 r10:0x0 r11:0x246 r12:0x70d201d89420 r13:0x70d23eb59e90 r14:0x2993 r15:0x70d2123fa260 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x3d fs_base:0x70d2123fc640 gs_base:0x0
  { tid:10320, addr:0x70d2123fa260, length:0x4 }
}
[FATAL src/log.cc:469:start_emergency_debug()] (session doesn't look interactive, aborting emergency debugging)
tysmith commented 5 months ago

I can reproduce this fairly consistently with a Firefox debug build loading http://saleepoch.com. --chaos might be required. It only seems to happen after a browser crash/assertion is triggered.

rocallahan commented 5 months ago

Sorry for the delay Tyson... can you still reproduce this? Can you pass -F to rr record, forcing it to stop, and get an rr stack trace?

tysmith commented 5 months ago

Hey no problem :)

I am seeing this consistently when using our tools but when I try to reproduce it using just rr and Firefox I cannot. I've tried to isolate specific environment tweaks that trigger the issue but I've been unsuccessful.

I've logged #3748 to propose a way to help collect debugging issue in a manner more compatible with automation/non-interactive mode.

tysmith commented 4 months ago

I have reduced the STR further. I think the issue might be related to the code writing minidumps.

khuey commented 4 months ago

I can reproduce that, thanks.

rr stack

(gdb) bt
#0  0x000073389bf2745f in accept () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00005e0141f2c9bb in rr::GdbServerConnection::await_debugger (this=0x5e0143cd0fe0, listen_fd=...)
    at /home/khuey/dev/rr/src/GdbServerConnection.cc:120
#2  0x00005e0141f2c91e in rr::GdbServerConnection::await_connection (t=0x5e0143caf720, listen_fd=..., 
    features=...) at /home/khuey/dev/rr/src/GdbServerConnection.cc:115
#3  0x00005e0141f9b00f in rr::emergency_debug (t=0x5e0143caf720)
    at /home/khuey/dev/rr/src/launch_debugger.cc:400
#4  0x00005e0141f9eeee in rr::start_emergency_debug (t=0x5e0143caf720) at /home/khuey/dev/rr/src/log.cc:494
#5  0x00005e0141f9f1b3 in rr::EmergencyDebugOstream::~EmergencyDebugOstream (this=0x7ffef8ec60e0, 
    __in_chrg=<optimized out>) at /home/khuey/dev/rr/src/log.cc:516
#6  0x00005e0142154e80 in rr::Task::regs (this=0x5e0143caf720) at /home/khuey/dev/rr/src/Task.cc:1163
#7  0x00005e014205f27c in rr::prepare_ptrace_legacy<rr::X64Arch> (t=0x5e0143b29ce0, syscall_state=...)
    at /home/khuey/dev/rr/src/record_syscall.cc:2599
#8  0x00005e0142054035 in rr::prepare_ptrace<rr::X64Arch> (t=0x5e0143b29ce0, syscall_state=...)
    at /home/khuey/dev/rr/src/record_syscall.cc:3177
#9  0x00005e0142034cd8 in rr::rec_prepare_syscall_arch<rr::X64Arch> (t=0x5e0143b29ce0, syscall_state=..., 
    regs=...) at /home/khuey/dev/rr/src/record_syscall.cc:5041
#10 0x00005e0142020cba in operator() (__closure=0x7ffef8ec7090, regs=...)
    at /home/khuey/dev/rr/src/record_syscall.cc:5401
#11 0x00005e014203c5f2 in rr::with_converted_registers<rr::Switchable, rr::rec_prepare_syscall_internal(rr::RecordTask*, rr::TaskSyscallState&)::<lambda(const rr::Registers&)> >(const rr::Registers &, rr::SupportedArch, struct {...}) (regs=..., arch=rr::x86_64, f=...) at /home/khuey/dev/rr/src/Registers.h:630
#12 0x00005e0142020d99 in rr::rec_prepare_syscall_internal (t=0x5e0143b29ce0, syscall_state=...)
    at /home/khuey/dev/rr/src/record_syscall.cc:5399
#13 0x00005e0142020e3e in rr::rec_prepare_syscall (t=0x5e0143b29ce0)
    at /home/khuey/dev/rr/src/record_syscall.cc:5411
#14 0x00005e0142000aae in rr::RecordSession::syscall_state_changed (this=0x5e0143ab1f10, t=0x5e0143b29ce0, 
    step_state=0x7ffef8ec755c) at /home/khuey/dev/rr/src/RecordSession.cc:1177
#15 0x00005e01420081e6 in rr::RecordSession::record_step (this=0x5e0143ab1f10)
    at /home/khuey/dev/rr/src/RecordSession.cc:2678
#16 0x00005e0141ff9c87 in rr::record (args=..., flags=...) at /home/khuey/dev/rr/src/RecordCommand.cc:711
#17 0x00005e0141ffaae2 in rr::RecordCommand::run (this=0x5e01423de1a0 <rr::RecordCommand::singleton>, 
    args=...) at /home/khuey/dev/rr/src/RecordCommand.cc:874
#18 0x00005e0141fb2d58 in main (argc=5, argv=0x7ffef8ec79a8) at /home/khuey/dev/rr/src/main.cc:278
(gdb) 
khuey commented 4 months ago

The rough sequence of events here is

  1. Task tid is ptraced.
  2. Task tid's process takes a group stop SIGSTOP which sets emulated_stop_pending
  3. Task tid is waited upon which clears emulated_stop_pending but does not set isstopped
  4. The ptracer calls PTRACE_GETREGS, which dies because Task tid is not isstopped (or dying).
khuey commented 4 months ago

I tried writing a test case for this but I actually triggered a different rr bug :/

tysmith commented 3 months ago

Hey @khuey I'm getting blocked by this again. If you could please have a look when you have some free time I'd really appreciate it. I double checked the above STR still work for me using commit 77f88f4.