Recording Firefox with Gecko Profiler causes crash

hotsphink commented 7 years ago

For https://bugzilla.mozilla.org/show_bug.cgi?id=1322559 I was trying to record a --disable-profiling build of Firefox with the (new) Gecko Profiler enabled ( https://raw.githubusercontent.com/mstange/Gecko-Profiler-Addon/master/gecko_profiler.xpi ). I was seeing a crash in GeckoSampler::doNativeBacktrace, which is actually what I wanted to see and debug, but it appears that it is behaving differently when rr is recording so it isn't the crash I was looking for. (To be clear, this is not a problem of divergence between record and replay; this is the recording affecting the initial run.)

What appears to be happening is that when a SIGPROF signal handler gets invoked, the stack pointer stored in its context argument is the stack pointer for rr's syscall hooking code, which is in a completely different stack from the actual executing program:

(rr) bt 14
#0  0x00007f9ea85830f5 in GeckoSampler::doNativeBacktrace(ThreadProfile&, TickSample*) (this=this@entry=0x7f9e6fca3ca0, aProfile=..., aSample=aSample@entry=0x681ff6d8) at /home/sfink/src/mozilla/tools/profiler/core/GeckoSampler.cpp:1139
#1  0x00007f9ea8583694 in GeckoSampler::InplaceTick(TickSample*) (this=0x7f9e6fca3ca0, sample=0x681ff6d8)
    at /home/sfink/src/mozilla/tools/profiler/core/GeckoSampler.cpp:1219
#2  0x00007f9ea857f209 in (anonymous namespace)::ProfilerSignalHandler(int, siginfo_t*, void*) (signal=<optimized out>, info=<optimized out>, context=0x681ff740) at /home/sfink/src/mozilla/tools/profiler/core/platform-linux.cc:252
#3  0x00007f9eb6c09c30 in <signal handler called> () at /lib64/libpthread.so.0
#4  0x000000007000000e in  ()
#5  0x00007f9eb700e233 in _raw_syscall () at /home/sfink/src/rr/src/preload/raw_syscall.S:120
#6  0x00007f9eb700b082 in untraced_syscall_base (syscallno=syscallno@entry=7, a0=a0@entry=140319650562601, a1=a1@entry=1, a2=a2@entry=-1, a3=a3@entry=0, a4=a4@entry=0, a5=0, syscall_instruction=0x7000000c) at /home/sfink/src/rr/src/preload/preload.c:334
#7  0x00007f9eb700c362 in syscall_hook (call=0x681fffa0) at /home/sfink/src/rr/src/preload/preload.c:1753
#8  0x00007f9eb700c362 in syscall_hook (call=0x681fffa0) at /home/sfink/src/rr/src/preload/preload.c:2402
#9  0x00007f9eb700c362 in syscall_hook (call=0x681fffa0) at /home/sfink/src/rr/src/preload/preload.c:2456
#10 0x00007f9eb700e26a in _syscall_hook_trampoline () at /home/sfink/src/rr/src/preload/syscall_hook.S:216
#11 0x00007f9eb700e293 in __morestack () at /home/sfink/src/rr/src/preload/syscall_hook.S:348
#12 0x00007f9eb700e2cd in _syscall_hook_trampoline_48_8b_3c_24 () at /home/sfink/src/rr/src/preload/syscall_hook.S:364
#13 0x00007f9eb5e82571 in poll () at /lib64/libc.so.6
#14 0x00007f9eaff93f80 in _xcb_conn_wait () at /lib64/libxcb.so.1

(rr) f 4
#4  0x000000007000000e in ?? ()
(rr) p/x $rsp
$14 = 0x681ffdf0
(rr) f 12
#12 0x00007f9eb700e2cd in _syscall_hook_trampoline_48_8b_3c_24 () at /home/sfink/src/rr/src/preload/syscall_hook.S:364
(rr) p/x $rsp
$17 = 0x681ffff0
(rr) f 13
#13 0x00007f9eb5e82571 in poll () from /lib64/libc.so.6
(rr) p/x $rsp
$18 = 0x7ffe68e53a90

doNativeBacktrace grabs a chunk of the stack to memcpy, and ends up biting off more than it can chew -- I mean, access. I guess what I'd like it to do is give the signal handler the register state as of the "call" to _syscall_hook_trampoline?

Keno commented 7 years ago

Try running with syscallbuf disabled rr record -n? Futzing with register state for the signal handler may be possible, but I'm not entirely convinced it's a good idea.