Closed rocallahan closed 4 years ago
I've written a testcase that just creates 100 do-nothing threads and then joins them all. Running 32 basic_test.run copies of that test in parallel usually means a few of them fail.
Some stats for 8 runs of 32 parallel tests each:
mprotect
Here's a different run of 16 x 32 parallel tests:
mprotect
, 4 leading up to clone
, 3 leading up to munmap
FWIW the executed syscall counts for one of those tests:
prctl 1
exit_group 1
arch_prctl 1
getrlimit 1
set_tid_address 1
rrcall_init_preload 1
execve 1
rt_sigprocmask 1
rt_sigaction 2
write 2
brk 3
geteuid 4
read 4
close 5
open 5
access 5
fstat 5
futex 82
munmap 97
exit 100
clone 100
madvise 100
set_robust_list 101
mprotect 110
mmap 114
And just for reference the syscalls before the overcount was detected are 14 mmap
, 4 mprotect
and 3 futex
.
I tried writing a test that does a lot of mmap/mprotect/munmap in a loop and couldn't get it to fail much. When I put that loop in 10 parallel threads I'd still mostly get failures around thread creation.
Turns out has_kvm_in_txcp_bug
was being set to true; setting IN_TXCP
on AMD lets you start a counter, but it always returns 0, triggering count < NUM_BRANCHES
. If I also require that count > 0
that avoids triggering the bug workarounds (namely the always_recreate_counters
workaround) and then .... tada, 16 x 32 parallel tests, zero failures!!!
Now a lot of the tests are timing out. Which is weird. I'm also seeing some other issues I didn't see before.
Ah OK. When has_kvm_in_txcp_bug
is false, we create a counter using IN_TXCP
and use that to measure, and on AMD that just always returns 0. Which fixes overcounting just fine but has other issues :-).
I checked whether, if we create a counter with no interrupt set and one with an interrupt set, they always agree. They do, even when they overcount.
I tried creating three counters, one counting user-only events (U), one counting kernel events (K), and one counting both (A). You'd expect U + K = A, and that holds on Intel apparently, but on AMD it never does! A almost always has extra events, usually 8, sometimes 16, sometimes 1, once in a while a lot more...
FWIW those changes are not correlated with overcounts. So that's a dead end probably.
Also, that implies the problem is probably not an issue of kernel-mode events being incorrectly counted.
Interesting: there are a lot more failures running the 32-bit tests: 117 vs 39 (out of 8 x 32) just now.
By spraying rdtsc
s around syscall entry points to create trace events, I can see that the overcounts are detected at the rdtsc
before mprotect
typically. That suggests they're occurring during user-mode execution or during any entry to the kernel, not specifically related to system calls. Maybe it's a hardware issue that happens to be easier to trigger when doing thread creation.
I'm out of ideas. It appears that the Ryzen PMU just isn't quite accurate enough :-(. rr might work OK for some kinds of usage but I wouldn't recommend it.
I'll land the patches I have with a warning for Ryzen users that things won't be reliable.
FYI AMD has posted an errata for its Ryzen CPUs and it includes multiple issues with performance counters, namely:
None of them involves PMCx0D1 directly (which I believe is what rr uses). Either way none of them has a planned fix or suggested workaround.
Of those errata:
The patches in PR #2255 might work on Ryzen. It would be great if someone could test. You'll have to change PerfCounters.cc
to to assign the same configuration to AMDRyzen
as AMDF15R30
.
Here is the patch to test with: https://github.com/rocallahan/rr/commit/cdf4e2751918d096f9624e96a3678e48deef94ed Building and testing instructions here: https://github.com/mozilla/rr/wiki/Building-And-Installing
I just checked the Bios and Kernel Developers' Guides on the AMD page, and all recent AMD CPUs appear to have PMCs 0xc4 and 0xc6; of course that doesn't mean they're reliable, but it might be worth checking this on all those we can still find users of.
I get 57 test failures on Ryzen (AMD EPYC 7401P 24-Core Processor
), but the alarm tests seem to work so there's hope.
Looks like Ryzen still doesn't work. I landed a join_threads
test on master that starts 100 threads and then joins them. Running
(for i in `seq 1 200`; do bash basic_test.run join_threads & done; wait)>& /tmp/output
I get 10+ ticks mismatch
errors every run.
@pipcet it would be great if you can run that test yourself to make sure you don't get any errors on your Bulldozer machine.
That is run from the rr test
directory of course.
The symptoms are similar to the issues I saw with the conditional branches approach before, so it's possible that at some point after Bulldozer AMD introduced a bug that destablized multiple types of counters.
The join_threads
test you posted passes all 200 runs here.
Some ideas:
There's also section 2.1.11.2 in https://developer.amd.com/wp-content/resources/56255_3_03.PDFm/wp-content/resources/56255_3_03.PDF, which I don't think applies here:
An option is provided for merging a pair of even/odd performance monitors to acquire an accurate count.
However, that would probably require kernel hacking...
The join_threads test you posted passes all 200 runs here.
Great!
Those are good ideas, but I took down my packet.net Ryzen machine so I'll have to spin it up again later to try them.
BTW, does this work for Zen+ as well? (cpu_type 0x00f80, ext_family 8).
Edit: I ran make test
, and looks like majority of the tests pass.
I don't think it works for either Zen or Zen+ at this point :-(
Do you have reasonable access to a Zen/Zen+ machine? You might be able to try a few ideas...
@pipcet I have a Zen+ machine. What can I do to help?
Awesome. Here's a first test you could run, if you have the time:
Can you try compiling this with gcc -nostdlib foo.s -o foo
:
.text
.globl _start
_start:
xor %eax,%eax
0: dec %eax
jmp 1f
1: jne 0b
jmp *(%eax)
And then running
for i in $(seq 1 10); do perf stat -e 'cpu/event=0xc2/u,cpu/event=0xc4/u,cpu/event=0xc6/u' ./foo; done
That should produce (after about a minute) a number of performance counter readings; none of them are likely to be reliable by themselves, but the differences between c2 and c6 or c4 and c6 might be. (The program terminates with a segmentation fault; that's normal, since we don't want to go to the trouble of making a syscall).
It would be also interesting, if, as root, you did the following
for a in /sys/devices/system/cpu/cpu[0-9]*; do echo 0 > $a/online; done
cat /sys/devices/system/cpu/online
(which takes all CPU cores except for a last one offline), then reran the perf
test. To take the CPUs back online, use
for a in /sys/devices/system/cpu/cpu[0-9]*; do echo 1 > $a/online; done
@pipcet Can confirm c2-c4 is consistently 4096. c6 is a really small value < 1000
@yshui How about c2-c6 and c4-c6? 4096 is strange, though. Are the values themselves in the range of 8.5 billion?
@pipcet c2-c6 seems to always be 8589942657
. c2 and c4 are both ~8.5 billion
Huh. Strange. It should be 0x200000000 precisely, rather than 0x200001f81. Can you try repeating the loop a few times, like this:
.text
.globl _start
_start:
xor %eax,%eax
0: dec %eax
jmp 1f
1: jne 0b
xor %eax,%eax
0: dec %eax
jmp 1f
1: jne 0b
xor %eax,%eax
0: dec %eax
jmp 1f
1: jne 0b
jmp *(%eax)
@pipcet c2-c4 is 4098. c2-c6 is 0x600001F81
4098? That's strange. The 0x1f81 we can deal with. Which tests are failing?
On my 2700x using c4/c6:
102 - exec_from_other_thread (Failed)
103 - exec_from_other_thread-no-syscallbuf (Failed)
164 - grandchild_threads_main_running (Failed)
165 - grandchild_threads_main_running-no-syscallbuf (Timeout)
166 - grandchild_threads_thread_running (Failed)
167 - grandchild_threads_thread_running-no-syscallbuf (Failed)
316 - no_mask_timeslice (Failed)
317 - no_mask_timeslice-no-syscallbuf (Failed)
370 - ptrace_attach_running (Failed)
371 - ptrace_attach_running-no-syscallbuf (Failed)
376 - ptrace_attach_thread_running (Failed)
377 - ptrace_attach_thread_running-no-syscallbuf (Failed)
690 - async_kill_with_threads_main_running (Timeout)
691 - async_kill_with_threads_main_running-no-syscallbuf (Timeout)
692 - async_kill_with_threads_thread_running (Timeout)
693 - async_kill_with_threads_thread_running-no-syscallbuf (Timeout)
742 - conditional_breakpoint_offload (Failed)
743 - conditional_breakpoint_offload-no-syscallbuf (Failed)
828 - overflow_branch_counter (Failed)
829 - overflow_branch_counter-no-syscallbuf (Failed)
852 - reverse_many_breakpoints (Failed)
908 - thread_open_race (Failed)
1292 - grandchild_threads_main_running-32 (Timeout)
1293 - grandchild_threads_main_running-32-no-syscallbuf (Timeout)
1294 - grandchild_threads_thread_running-32 (Timeout)
1295 - grandchild_threads_thread_running-32-no-syscallbuf (Failed)
1444 - no_mask_timeslice-32 (Failed)
1445 - no_mask_timeslice-32-no-syscallbuf (Failed)
1498 - ptrace_attach_running-32 (Failed)
1499 - ptrace_attach_running-32-no-syscallbuf (Failed)
1504 - ptrace_attach_thread_running-32 (Failed)
1505 - ptrace_attach_thread_running-32-no-syscallbuf (Timeout)
1818 - async_kill_with_threads_main_running-32 (Timeout)
1819 - async_kill_with_threads_main_running-32-no-syscallbuf (Failed)
1820 - async_kill_with_threads_thread_running-32 (Timeout)
1821 - async_kill_with_threads_thread_running-32-no-syscallbuf (Timeout)
1840 - block_intr_sigchld-32 (Failed)
1865 - clone_interruption-32-no-syscallbuf (Failed)
1870 - conditional_breakpoint_offload-32 (Failed)
1871 - conditional_breakpoint_offload-32-no-syscallbuf (Failed)
1956 - overflow_branch_counter-32 (Failed)
1957 - overflow_branch_counter-32-no-syscallbuf (Failed)
1980 - reverse_many_breakpoints-32 (Failed)
1981 - reverse_many_breakpoints-32-no-syscallbuf (Failed)
2036 - thread_open_race-32 (Failed)
I'm trying it again with c2/c6 but there are still some failing tests. I will report back when I get back from work.
Hmm. I really don't know what those counters are counting, but it doesn't appear to match the documentation, at least not precisely. 0xc6 sounds like it might actually do what it says in the documentation, but at this point I'd recommend actually trying 0xc0-0xc5 with the code I posted above to figure out which differences are constant.
c2/c6 results follow. Unfortunately, some of the same tests fail.
12 - alarm (Failed)
103 - exec_from_other_thread-no-syscallbuf (Failed)
164 - grandchild_threads_main_running (Failed)
165 - grandchild_threads_main_running-no-syscallbuf (Timeout)
166 - grandchild_threads_thread_running (Timeout)
167 - grandchild_threads_thread_running-no-syscallbuf (Failed)
316 - no_mask_timeslice (Failed)
317 - no_mask_timeslice-no-syscallbuf (Failed)
370 - ptrace_attach_running (Failed)
371 - ptrace_attach_running-no-syscallbuf (Failed)
376 - ptrace_attach_thread_running (Timeout)
377 - ptrace_attach_thread_running-no-syscallbuf (Failed)
436 - record_replay_subject (Failed)
642 - timer (Failed)
643 - timer-no-syscallbuf (Failed)
690 - async_kill_with_threads_main_running (Failed)
691 - async_kill_with_threads_main_running-no-syscallbuf (Failed)
692 - async_kill_with_threads_thread_running (Timeout)
693 - async_kill_with_threads_thread_running-no-syscallbuf (Timeout)
742 - conditional_breakpoint_offload (Failed)
743 - conditional_breakpoint_offload-no-syscallbuf (Failed)
768 - execve_loop (Failed)
828 - overflow_branch_counter (Failed)
829 - overflow_branch_counter-no-syscallbuf (Failed)
852 - reverse_many_breakpoints (Failed)
853 - reverse_many_breakpoints-no-syscallbuf (Failed)
908 - thread_open_race (Failed)
991 - checkpoint_prctl_name-no-syscallbuf (Failed)
1044 - record_replay (Failed)
1045 - record_replay-no-syscallbuf (Failed)
1056 - reverse_alarm (Failed)
1292 - grandchild_threads_main_running-32 (Failed)
1293 - grandchild_threads_main_running-32-no-syscallbuf (Failed)
1294 - grandchild_threads_thread_running-32 (Failed)
1295 - grandchild_threads_thread_running-32-no-syscallbuf (Failed)
1334 - join_threads-32 (Failed)
1444 - no_mask_timeslice-32 (Failed)
1445 - no_mask_timeslice-32-no-syscallbuf (Failed)
1498 - ptrace_attach_running-32 (Failed)
1499 - ptrace_attach_running-32-no-syscallbuf (Failed)
1504 - ptrace_attach_thread_running-32 (Failed)
1505 - ptrace_attach_thread_running-32-no-syscallbuf (Failed)
1544 - ptracer_death_multithread_peer-32 (Failed)
1564 - record_replay_subject-32 (Failed)
1565 - record_replay_subject-32-no-syscallbuf (Failed)
1770 - timer-32 (Failed)
1818 - async_kill_with_threads_main_running-32 (Failed)
1819 - async_kill_with_threads_main_running-32-no-syscallbuf (Timeout)
1820 - async_kill_with_threads_thread_running-32 (Timeout)
1821 - async_kill_with_threads_thread_running-32-no-syscallbuf (Timeout)
1864 - clone_interruption-32 (Failed)
1870 - conditional_breakpoint_offload-32 (Failed)
1871 - conditional_breakpoint_offload-32-no-syscallbuf (Failed)
1956 - overflow_branch_counter-32 (Failed)
1957 - overflow_branch_counter-32-no-syscallbuf (Failed)
1980 - reverse_many_breakpoints-32 (Failed)
1981 - reverse_many_breakpoints-32-no-syscallbuf (Failed)
2038 - thread_stress-32 (Failed)
2172 - record_replay-32 (Failed)
2173 - record_replay-32-no-syscallbuf (Failed)
2184 - reverse_alarm-32 (Failed)
I'll analyze c0-c5 as you suggest shortly.
Sorry, I meant looking at PMCs c0, c1, c3, c5 separately or in conjunction with PMC c6, not looking at the difference between PMC c0 and PMC c5, which is very unlikely to work.
These are the data I got: Ran your three-iteration program with one core and all cores, but I'm not sure if any of it is useful. I ran in somewhat controlled conditions (runlevel 3 with Precision Boost Overdrive off) to see if I could get more consistent data.
Looking at the documentation now it looks like d1, d2, and 1d0 are also performance counters, although I don't enough about all this to know if they're of any use. I can measure them later if you think it would be interesting.
I tried to use rr on Ryzen 1800X but got the following error:
FATAL /build/rr-zJA6OY/rr-4.4.0/src/PerfCounters.cc:138:get_cpu_microarch() errno: EOPNOTSUPP] CPU 0xf10 unknown
oops my bad !!! I called Debiand rr (4.4.0) which produced this error. But when I tried with the compiled rr from github I did not see the messages "You have a Ryzen CPU. The Ryzen retired-conditional-branches hardware performance counter is not accurate enough; rr will be unreliable."
@JedTheKrampus Which documentation are you looking at? I can't find anything about 0xd2 in the "Open Source Register Reference", so that might be worth a try. The rest doesn't look at all reproducible, which is strangely different from the other test results. Are you sure you ran perf correctly, with /u and the program specified on the same command line?
I'm looking at the PPPR from 2017, 2.1.13.3.5. I haven't found any documentation for what the register does except for its name, Retired Conditional Branch Instructions Mispredicted, which doesn't sound very promising.
I'm quite sure I ran perf as you asked. There must be some reason why I'm seeing so much volatility from the PMCs. I'm going to do some more investigation and see if I can figure anything out. Maybe there's some system configuration I can change that happens to make them more stable. This stuff is pretty removed from my day job so I don't know if I will actually find anything, ehehe...
I found a bit in the documentation that suggests the processor clock on these CPUs can't be assumed to be constant. Maybe there's some way we need to take APERF and/or MPERF into account to get a reliable result.
Has anyone succeeded in using rr with Ryzen? Any update is really appreciated. I need rr mainly for debugging Firefox.
@mehdisadeghi I've used it a couple of times on the clang code base on a Ryzen 1800X for basic reverse debugging. I'm not sure entirely what functionality doesn't work correctly, or if it will work for your specific use case.
That test
(for i in `seq 1 200`; do bash basic_test.run join_threads & done; wait)>& /tmp/output
fails too on Zen 2. (either as AMDRyzen
or AMDF15R30
)
Here are the test results on an AMD Ryzen Threadripper 1950X 16-Core CPU, running rr 5.2.0 from git on Manjaro Linux; of course with kernel.perf_event_paranoid = 1:
`93% tests passed, 170 tests failed out of 2295
Total Test time (real) = 21541.35 sec
The following tests FAILED: 174 - grandchild_threads_thread_running (Failed) 650 - sysfs (Failed) 651 - sysfs-no-syscallbuf (Failed) 714 - async_signal_syscalls2 (Failed) 736 - breakpoint (Failed) 737 - breakpoint-no-syscallbuf (Failed) 742 - call_function (Failed) 743 - call_function-no-syscallbuf (Failed) 794 - explicit_checkpoints (Failed) 795 - explicit_checkpoints-no-syscallbuf (Failed) 804 - goto_event (Failed) 805 - goto_event-no-syscallbuf (Failed) 822 - invalid_jump (Failed) 823 - invalid_jump-no-syscallbuf (Failed) 860 - read_big_struct (Failed) 861 - read_big_struct-no-syscallbuf (Failed) 878 - search (Failed) 879 - search-no-syscallbuf (Failed) 896 - step_thread (Failed) 897 - step_thread-no-syscallbuf (Failed) 904 - string_instructions_multiwatch (Failed) 905 - string_instructions_multiwatch-no-syscallbuf (Failed) 908 - string_instructions_watch (Failed) 909 - string_instructions_watch-no-syscallbuf (Failed) 914 - target_fork (Failed) 915 - target_fork-no-syscallbuf (Failed) 916 - target_process (Failed) 917 - target_process-no-syscallbuf (Failed) 928 - thread_open_race (Failed) 931 - thread_stress-no-syscallbuf (Failed) 944 - vdso_gettimeofday_stack (Failed) 946 - vdso_clock_gettime_stack (Failed) 948 - vdso_time_stack (Failed) 976 - break_block (Failed) 977 - break_block-no-syscallbuf (Failed) 978 - break_clock (Failed) 979 - break_clock-no-syscallbuf (Failed) 980 - break_clone (Failed) 981 - break_clone-no-syscallbuf (Failed) 982 - break_exec (Failed) 983 - break_exec-no-syscallbuf (Failed) 986 - break_mmap_private (Failed) 987 - break_mmap_private-no-syscallbuf (Failed) 988 - break_msg (Failed) 989 - break_msg-no-syscallbuf (Failed) 990 - break_rdtsc (Failed) 991 - break_rdtsc-no-syscallbuf (Failed) 992 - break_sigreturn (Failed) 993 - break_sigreturn-no-syscallbuf (Failed) 994 - break_sync_signal (Failed) 995 - break_sync_signal-no-syscallbuf (Failed) 996 - break_thread (Failed) 997 - break_thread-no-syscallbuf (Failed) 998 - break_time_slice (Failed) 999 - break_time_slice-no-syscallbuf (Failed) 1000 - breakpoint_consistent (Failed) 1001 - breakpoint_consistent-no-syscallbuf (Failed) 1002 - call_exit (Failed) 1003 - call_exit-no-syscallbuf (Failed) 1020 - dead_thread_target (Failed) 1021 - dead_thread_target-no-syscallbuf (Failed) 1034 - explicit_checkpoint_clone (Failed) 1035 - explicit_checkpoint_clone-no-syscallbuf (Failed) 1042 - fork_exec_info_thr (Failed) 1043 - fork_exec_info_thr-no-syscallbuf (Failed) 1046 - get_thread_list (Failed) 1047 - get_thread_list-no-syscallbuf (Failed) 1062 - read_bad_mem (Failed) 1063 - read_bad_mem-no-syscallbuf (Failed) 1102 - shm_checkpoint (Failed) 1103 - shm_checkpoint-no-syscallbuf (Failed) 1108 - signal_stop (Failed) 1109 - signal_stop-no-syscallbuf (Failed) 1110 - signal_checkpoint (Failed) 1111 - signal_checkpoint-no-syscallbuf (Failed) 1120 - step1 (Failed) 1121 - step1-no-syscallbuf (Failed) 1122 - step_rdtsc (Failed) 1123 - step_rdtsc-no-syscallbuf (Failed) 1124 - step_signal (Failed) 1125 - step_signal-no-syscallbuf (Failed) 1374 - legacy_ugid-32 (Failed) 1375 - legacy_ugid-32-no-syscallbuf (Failed) 1562 - ptrace_sysemu-32 (Failed) 1563 - ptrace_sysemu-32-no-syscallbuf (Failed) 1574 - ptracer_death_multithread-32 (Failed) 1732 - simple_threads_stress-32 (Failed) 1798 - sysfs-32 (Failed) 1799 - sysfs-32-no-syscallbuf (Failed) 1860 - async_signal_syscalls2-32 (Failed) 1882 - breakpoint-32 (Failed) 1883 - breakpoint-32-no-syscallbuf (Failed) 1888 - call_function-32 (Failed) 1889 - call_function-32-no-syscallbuf (Failed) 1898 - clone_interruption-32 (Failed) 1906 - condvar_stress-32 (Failed) 1907 - condvar_stress-32-no-syscallbuf (Failed) 1940 - explicit_checkpoints-32 (Failed) 1941 - explicit_checkpoints-32-no-syscallbuf (Failed) 1950 - goto_event-32 (Failed) 1951 - goto_event-32-no-syscallbuf (Failed) 1968 - invalid_jump-32 (Failed) 1969 - invalid_jump-32-no-syscallbuf (Failed) 2006 - read_big_struct-32 (Failed) 2007 - read_big_struct-32-no-syscallbuf (Failed) 2024 - search-32 (Failed) 2025 - search-32-no-syscallbuf (Failed) 2042 - step_thread-32 (Failed) 2043 - step_thread-32-no-syscallbuf (Failed) 2050 - string_instructions_multiwatch-32 (Failed) 2051 - string_instructions_multiwatch-32-no-syscallbuf (Failed) 2054 - string_instructions_watch-32 (Failed) 2055 - string_instructions_watch-32-no-syscallbuf (Failed) 2060 - target_fork-32 (Failed) 2061 - target_fork-32-no-syscallbuf (Failed) 2062 - target_process-32 (Failed) 2063 - target_process-32-no-syscallbuf (Failed) 2074 - thread_open_race-32 (Failed) 2090 - vdso_gettimeofday_stack-32 (Failed) 2092 - vdso_clock_gettime_stack-32 (Failed) 2094 - vdso_time_stack-32 (Failed) 2122 - break_block-32 (Failed) 2123 - break_block-32-no-syscallbuf (Failed) 2124 - break_clock-32 (Failed) 2125 - break_clock-32-no-syscallbuf (Failed) 2126 - break_clone-32 (Failed) 2127 - break_clone-32-no-syscallbuf (Failed) 2128 - break_exec-32 (Failed) 2129 - break_exec-32-no-syscallbuf (Failed) 2132 - break_mmap_private-32 (Failed) 2133 - break_mmap_private-32-no-syscallbuf (Failed) 2134 - break_msg-32 (Failed) 2135 - break_msg-32-no-syscallbuf (Failed) 2136 - break_rdtsc-32 (Failed) 2137 - break_rdtsc-32-no-syscallbuf (Failed) 2138 - break_sigreturn-32 (Failed) 2139 - break_sigreturn-32-no-syscallbuf (Failed) 2140 - break_sync_signal-32 (Failed) 2141 - break_sync_signal-32-no-syscallbuf (Failed) 2142 - break_thread-32 (Failed) 2143 - break_thread-32-no-syscallbuf (Failed) 2144 - break_time_slice-32 (Failed) 2145 - break_time_slice-32-no-syscallbuf (Failed) 2146 - breakpoint_consistent-32 (Failed) 2147 - breakpoint_consistent-32-no-syscallbuf (Failed) 2148 - call_exit-32 (Failed) 2149 - call_exit-32-no-syscallbuf (Failed) 2166 - dead_thread_target-32 (Failed) 2167 - dead_thread_target-32-no-syscallbuf (Failed) 2180 - explicit_checkpoint_clone-32 (Failed) 2181 - explicit_checkpoint_clone-32-no-syscallbuf (Failed) 2188 - fork_exec_info_thr-32 (Failed) 2189 - fork_exec_info_thr-32-no-syscallbuf (Failed) 2192 - get_thread_list-32 (Failed) 2193 - get_thread_list-32-no-syscallbuf (Failed) 2208 - read_bad_mem-32 (Failed) 2209 - read_bad_mem-32-no-syscallbuf (Failed) 2236 - reverse_step_threads2-32 (Failed) 2248 - shm_checkpoint-32 (Failed) 2249 - shm_checkpoint-32-no-syscallbuf (Failed) 2254 - signal_stop-32 (Failed) 2255 - signal_stop-32-no-syscallbuf (Failed) 2256 - signal_checkpoint-32 (Failed) 2257 - signal_checkpoint-32-no-syscallbuf (Failed) 2266 - step1-32 (Failed) 2267 - step1-32-no-syscallbuf (Failed) 2268 - step_rdtsc-32 (Failed) 2269 - step_rdtsc-32-no-syscallbuf (Failed) 2270 - step_signal-32 (Failed) 2271 - step_signal-32-no-syscallbuf (Failed) `
Ryzen has a conditional branch counter. I have patches to use it here: https://github.com/mozilla/rr/tree/ryzen
To make it work reliably I had to increase the skid counter to 1000. That's pretty high, but OK. The patches make the skid size configurable per-architecture so we don't take that hit on Intel.
With these patches, most tests pass and the rest seem to be intermittent. In one run I get 10 failures out of 2068:
It appears that all these failures are due to intermittent overcounting. In most of them, during recording we seem to have overcounted a few conditional branches in the leadup to some syscall. In the rest, we seem to have overcounted during replay.
One interesting thing is that most of the syscalls where we detect the overcount are an
mprotect
(or a syscall following a syscall-bufferedmprotect
) that followed anmmap
. There are two exceptions, one aread
syscall and one awrite
syscall. I need to think about what this might mean.