Open joneschrisg opened 10 years ago
I'm sort of interested in looking at this. What is the current equivalent of rec_process_event.cc?
record_syscall.cc
I'm working on this. I modified the function process_syscall_results
based on my need like:
if (should_emulate_result) {
Registers r = t->regs();
r.set_syscall_result(emulated_result);
t->set_regs(r);
}
+ Registers r = t->regs();
+ r.set_syscall_result(random_values);
+ t->set_regs(r);
for (auto& action : after_syscall_actions) {
action(t);
}
It works for recording, but when I tried to replay, I sometimes got
[ERROR src/Registers.cc:299:maybe_print_reg_mismatch()] r12 0x1 != 0xfffffffffffffff7 (replaying vs. recorded)
[FATAL src/Registers.cc:409:compare_register_files()]
(task 138097 (rec:46669) at time 11761)
-> Assertion `!bail_error || match' failed to hold. Fatal register mismatch (ticks/rec:2391/2384)
Any suggestions for this, where should I modify correspondingly @rocallahan ?
Many thanks!
Which syscalls are you messing with? I could see this causing problems for syscalls that have special handling during replay.
several file/network related:
case __NR_read:
case __NR_write:
case __NR_readv:
case __NR_preadv:
case __NR_preadv2:
case __NR_writev:
case __NR_pwritev:
case __NR_pwritev2:
case __NR_recvfrom:
case __NR_sendto:
case __NR_recvmsg:
case __NR_sendmsg:
case __NR_recvmmsg:
case __NR_sendmmsg:
Push the actual branch somewhere so we can see the details?
Thanks for the reply! The code is inside a research project that will only be made opensource after acceptance, so sorry for that.
But the code logic is like: In record_syscall.cc:
if (should_emulate_result) {
Registers r = t->regs();
r.set_syscall_result(emulated_result);
t->set_regs(r);
}
+ call_other_func(t);
for (auto& action : after_syscall_actions) {
action(t);
}
In the file of call_other_func
:
switch (t->regs().original_syscallno()) {
case __NR_write:
case __NR_readv:
case __NR_preadv:
case __NR_preadv2:
case __NR_writev:
case __NR_pwritev:
case __NR_pwritev2:
case __NR_recvfrom:
case __NR_sendto:
case __NR_recvmsg:
case __NR_sendmsg:
case __NR_recvmmsg:
case __NR_sendmmsg:
if (condition_value) {
ret_val = error_code;
r.set_syscall_result(-ret_val);
t->set_regs(r);
}
break;
}
I also try to know which syscall that causes the mismatch and print the syscallno in the enter_syscall in replay session, it seems it is related to the recvfrom
syscall:
enter syscall: 14 original syscall no:-1
enter syscall: -9 original syscall no:45
[ERROR src/Registers.cc:299:maybe_print_reg_mismatch()] r12 0x1 != 0xfffffffffffffff7 (replaying vs. recorded)
[FATAL src/Registers.cc:417:compare_register_files()]
(task 2576894 (rec:46669) at time 11761)
-> Assertion `!bail_error || match' failed to hold. reg1:-38 reg2:-38
Fatal register mismatch (ticks/rec:2391/2384)
Tail of trace dump:
{
real_time:16138237.420949 global_time:11741, event:`SYSCALL: sysinfo' (state:ENTERING_SYSCALL) tid:46589, ticks:106526269
rax:0xffffffffffffffda rbx:0x7ffff459afa0 rcx:0xffffffffffffffff rdx:0x1d0909b38 rsi:0x7ffff469aa40 rdi:0x7ffff469aa90 rbp:0x7ffff459afa0 rsp:0x7ffff459ad40 r8:0x7ffff5e0ffa8 r9:0x7ffff0064680 r10:0x31 r11:0x246 r12:0x7ffff0063800 r13:0x7ffff469ad40 r14:0x7ffff009db20 r15:0x7ffff5e0ffa8 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x63 fs_base:0x7ffff469b700 gs_base:0x0
}
{
real_time:16138237.420961 global_time:11742, event:`SYSCALLBUF_RESET' tid:46589, ticks:106526269
}
{
real_time:16138237.421032 global_time:11743, event:`SYSCALL: sysinfo' (state:EXITING_SYSCALL) tid:46589, ticks:106526269
rax:0x0 rbx:0x7ffff459afa0 rcx:0xffffffffffffffff rdx:0x1d0909b38 rsi:0x7ffff469aa40 rdi:0x7ffff469aa90 rbp:0x7ffff459afa0 rsp:0x7ffff459ad40 r8:0x7ffff5e0ffa8 r9:0x7ffff0064680 r10:0x31 r11:0x246 r12:0x7ffff0063800 r13:0x7ffff469ad40 r14:0x7ffff009db20 r15:0x7ffff5e0ffa8 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x63 fs_base:0x7ffff469b700 gs_base:0x0
{ tid:46589, addr:0x7ffff469aa90, length:0x70 }
}
{
real_time:16138237.421243 global_time:11744, event:`SYSCALLBUF_FLUSH' tid:46589, ticks:106526432
{ syscall:'clock_gettime', ret:0x0, size:0x20 }
}
{
real_time:16138237.421253 global_time:11745, event:`SYSCALL: futex' (state:ENTERING_SYSCALL) tid:46589, ticks:106526432
rax:0xffffffffffffffda rbx:0x7ffff459afa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x89 rdi:0x7ffff0064c78 rbp:0x7ffff459afa0 rsp:0x7ffff459ad40 r8:0x0 r9:0xffffffff r10:0x7ffff469ab90 r11:0x246 r12:0x7ffff0064c74 r13:0x7ffff0064c28 r14:0x7ffff0064c78 r15:0x7ffff469ab00 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xca fs_base:0x7ffff469b700 gs_base:0x0
}
{
real_time:16138237.421265 global_time:11746, event:`SYSCALLBUF_RESET' tid:46589, ticks:106526432
}
{
real_time:16138237.421371 global_time:11747, event:`SYSCALL: futex' (state:EXITING_SYSCALL) tid:46584, ticks:110259083
rax:0x0 rbx:0x7ffff489bfa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x89 rdi:0x7ffff0062b78 rbp:0x7ffff489bfa0 rsp:0x7ffff489bd40 r8:0x0 r9:0xffffffff r10:0x7ffff499bb90 r11:0x246 r12:0x7ffff0062b74 r13:0x7ffff0062b28 r14:0x7ffff0062b78 r15:0x7ffff499bb00 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xca fs_base:0x7ffff499c700 gs_base:0x0
}
{
real_time:16138237.421566 global_time:11748, event:`SYSCALLBUF_FLUSH' tid:46584, ticks:110259238
{ syscall:'futex', ret:0x0, size:0x14 }
{ syscall:'clock_gettime', ret:0x0, size:0x20 }
}
{
real_time:16138237.421577 global_time:11749, event:`SYSCALL: futex' (state:ENTERING_SYSCALL) tid:46584, ticks:110259238
rax:0xffffffffffffffda rbx:0x7ffff489bfa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x89 rdi:0x7ffff0062b7c rbp:0x7ffff489bfa0 rsp:0x7ffff489bd40 r8:0x0 r9:0xffffffff r10:0x7ffff499bb90 r11:0x246 r12:0x7ffff0062b74 r13:0x7ffff0062b28 r14:0x7ffff0062b7c r15:0x7ffff499bb00 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xca fs_base:0x7ffff499c700 gs_base:0x0
}
{
real_time:16138237.421588 global_time:11750, event:`SYSCALLBUF_RESET' tid:46584, ticks:110259238
}
{
real_time:16138237.421680 global_time:11751, event:`SYSCALL: futex' (state:EXITING_SYSCALL) tid:46596, ticks:44243
rax:0xffffffffffffff92 rbx:0x7fffd22dcfa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x89 rdi:0x7ffff00a427c rbp:0x7fffd22dcfa0 rsp:0x7fffd22dcd40 r8:0x0 r9:0xffffffff r10:0x7fffd23dccb0 r11:0x246 r12:0x7ffff00a4274 r13:0x7ffff00a4228 r14:0x7ffff00a427c r15:0x7fffd23dcc20 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xca fs_base:0x7fffd23dd700 gs_base:0x0
}
{
real_time:16138237.421876 global_time:11752, event:`SYSCALLBUF_FLUSH' tid:46596, ticks:44508
{ syscall:'futex', ret:0x0, size:0x14 }
{ syscall:'clock_gettime', ret:0x0, size:0x20 }
{ syscall:'clock_gettime', ret:0x0, size:0x20 }
{ syscall:'clock_gettime', ret:0x0, size:0x20 }
{ syscall:'clock_gettime', ret:0x0, size:0x20 }
}
{
real_time:16138237.421886 global_time:11753, event:`SYSCALL: futex' (state:ENTERING_SYSCALL) tid:46596, ticks:44508
rax:0xffffffffffffffda rbx:0x7fffd22dcfa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x89 rdi:0x7ffff00a427c rbp:0x7fffd22dcfa0 rsp:0x7fffd22dcd40 r8:0x0 r9:0xffffffff r10:0x7fffd23dccb0 r11:0x246 r12:0x7ffff00a4274 r13:0x7ffff00a4228 r14:0x7ffff00a427c r15:0x7fffd23dcc20 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xca fs_base:0x7fffd23dd700 gs_base:0x0
}
{
real_time:16138237.421898 global_time:11754, event:`SYSCALLBUF_RESET' tid:46596, ticks:44508
}
{
real_time:16138237.428152 global_time:11755, event:`SYSCALL: poll' (state:EXITING_SYSCALL) tid:46652, ticks:17337850
rax:0x1 rbx:0x7fffd0ab8fa0 rcx:0xffffffffffffffff rdx:0x190 rsi:0x1 rdi:0x7fffd0ba8258 rbp:0x7fffd0ab8fa0 rsp:0x7fffd0ab8d40 r8:0x0 r9:0x89aea888 r10:0x12e r11:0x246 r12:0x0 r13:0x7fffd0ba8258 r14:0x712040de r15:0x712040e6 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x7 fs_base:0x7fffd0bb9700 gs_base:0x0
{ tid:46652, addr:0x7fffd0ba8258, length:0x8 }
}
{
real_time:16138237.428493 global_time:11756, event:`SYSCALLBUF_FLUSH' tid:46652, ticks:17347329
{ syscall:'recvfrom', ret:0x14, size:0x24, desched:1 }
{ syscall:'sendto', ret:0x14, size:0x10, desched:1 }
{ syscall:'clock_gettime', ret:0x0, size:0x20 }
{ syscall:'poll', ret:0x0, size:0x18 }
}
{
real_time:16138237.428526 global_time:11757, event:`SYSCALL: poll' (state:ENTERING_SYSCALL) tid:46652, ticks:17347329
rax:0xffffffffffffffda rbx:0x7fffd0ab8fa0 rcx:0xffffffffffffffff rdx:0x190 rsi:0x1 rdi:0x7fffd0ba8278 rbp:0x7fffd0ab8fa0 rsp:0x7fffd0ab8d40 r8:0x0 r9:0x89aea888 r10:0x134 r11:0x246 r12:0x0 r13:0x7fffd0ba8278 r14:0x71204086 r15:0x7120408e rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x7 fs_base:0x7fffd0bb9700 gs_base:0x0
}
{
real_time:16138237.428539 global_time:11758, event:`SYSCALLBUF_RESET' tid:46652, ticks:17347329
}
{
real_time:16138237.450134 global_time:11759, event:`SYSCALL: recvfrom' (state:EXITING_SYSCALL) tid:46669, ticks:2377
rax:0xfffffffffffffff7 rbx:0x7170404e rcx:0xffffffffffffffff rdx:0x2000 rsi:0x7170405e rdi:0x60 rbp:0x0 rsp:0x7fffb3afed30 r8:0x0 r9:0x0 r10:0x0 r11:0x246 r12:0x0 r13:0x7170405e r14:0x0 r15:0x2000 rip:0x7000000e eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x2d fs_base:0x7fffb3bff700 gs_base:0x0
{ tid:46669, addr:0x7170404e, length:0x8 }
{ tid:46669, addr:0x7170405e, length:0x2000 }
}
{
real_time:16138237.450155 global_time:11760, event:`SYSCALLBUF_ABORT_COMMIT' tid:46669, ticks:2377
}
{
real_time:16138237.450609 global_time:11761, event:`SYSCALL: rrcall_notify_syscall_hook_exit' (state:ENTERING_SYSCALL) tid:46669, ticks:2384
rax:0xffffffffffffffda rbx:0x7fffb3afefa0 rcx:0xffffffffffffffff rdx:0x2000 rsi:0x7fffb3bee5b0 rdi:0x60 rbp:0x7fffb3afefa0 rsp:0x7fffb3afee20 r8:0x0 r9:0x0 r10:0x0 r11:0x246 r12:0xfffffffffffffff7 r13:0x7fffb3bee530 r14:0x7fffb3bff680 r15:0x7ffff403e210 rip:0x70000005 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x3ea fs_base:0x7fffb3bff700 gs_base:0x0
}
{
real_time:16138237.450634 global_time:11762, event:`SYSCALLBUF_RESET' tid:46669, ticks:2384
}
=== Start rr backtrace:
../../build/rr/bin/rr(_ZN2rr13dump_rr_stackEv+0x5d)[0x555555b348e5]
../../build/rr/bin/rr(_ZN2rr9GdbServer15emergency_debugEPNS_4TaskE+0xf8)[0x55555591d9a6]
../../build/rr/bin/rr(+0x402ec4)[0x555555956ec4]
../../build/rr/bin/rr(_ZN2rr21EmergencyDebugOstreamD1Ev+0x66)[0x55555595714c]
../../build/rr/bin/rr(_ZN2rr9Registers22compare_register_filesEPNS_10ReplayTaskEPKcRKS0_S4_S6_NS_16MismatchBehaviorE+0x1cf)[0x555555a578d9]
../../build/rr/bin/rr(_ZN2rr10ReplayTask13validate_regsEj+0x219)[0x555555a806b1]
../../build/rr/bin/rr(+0x51005e)[0x555555a6405e]
../../build/rr/bin/rr(_ZN2rr13ReplaySession13enter_syscallEPNS_10ReplayTaskERKNS0_15StepConstraintsE+0x55b)[0x555555a645d9]
../../build/rr/bin/rr(_ZN2rr13ReplaySession18try_one_trace_stepEPNS_10ReplayTaskERKNS0_15StepConstraintsE+0x127)[0x555555a69949]
../../build/rr/bin/rr(_ZN2rr13ReplaySession11replay_stepERKNS0_15StepConstraintsE+0x1d7)[0x555555a6b1a9]
../../build/rr/bin/rr(_ZN2rr13ReplaySession11replay_stepENS_10RunCommandE+0x4f)[0x555555a60227]
../../build/rr/bin/rr(+0x50a847)[0x555555a5e847]
../../build/rr/bin/rr(+0x50b036)[0x555555a5f036]
../../build/rr/bin/rr(_ZN2rr13ReplayCommand3runERSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EE+0x422)[0x555555a5fe88]
../../build/rr/bin/rr(main+0x289)[0x55555596a960]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7ffff792f083]
../../build/rr/bin/rr(_start+0x2e)[0x55555585e7ee]
=== End rr backtrace
Launch gdb with
gdb '-l' '10000' '-ex' 'set sysroot /' '-ex' 'target extended-remote 127.0.0.1:20990' /usr/lib/jvm/java-11-openjdk-amd64/bin/java
Does the problem go away with rr record -n
? You probably need to disable syscallbuf because record_syscall.cc
won't see buffered syscalls and therefore won't be able to inject faults into them.
Otherwise, I don't know, but you should be able to debug it yourself.
Also, FWIW, instead of using __NR_write
you should use Arch::write
etc, otherwise your code won't work correctly when x86-64 rr records a 32-bit tracee. I assume this doesn't actually matter to you in practice.
It works with rr record -n
! Thank you so much @rocallahan !
I'll also follow your advice to write more compatible code.
Complementary to @rocallahan's gecko chaos mode. There are bazillion other tools that do this, but implementing this in rr is particularly compelling because
This could be done with relatively little pain by adding a fault injection hook to rec_process_event.cc, and then recording pseudo-events when faults are injected so that they can be replayed.