Assertion `ticks_now == trace_ticks' failed to hold. ticks mismatch for 'INSTRUCTION_TRAP'; on AMD CPU

seelabs commented 3 years ago

I'm running rr build from this commit: e9ec388d95ae05bc2eceb0361aea39510f08e17e on an "AMD Ryzen Threadripper 3970X 32-Core Processor". I run a script that does the following:

sudo sysctl kernel.perf_event_paranoid=1
sudo sysctl kernel.perf_cpu_time_max_percent=0
sudo ~/apps/rr-latest/scripts/zen_workaround.py

When I playback a recording, rr complains with the following message (I get a similar error leaving off perf_cpu_time_max_percent in the script, but thought I'd mention I tried that):

Please let me know if there's anything I can do on my end to help resolve this. I'll note that I am able to record and replay the same program on my intel based system.

[FATAL ../../src/ReplaySession.cc:1096:check_ticks_consistency()] 
 (task 270069 (rec:269162) at time 978)
 -> Assertion `ticks_now == trace_ticks' failed to hold. ticks mismatch for 'INSTRUCTION_TRAP'; expected 2889497, got 2889499
Tail of trace dump:
{
  real_time:24389.278872 global_time:958, event:`SYSCALL: brk' (state:EXITING_SYSCALL) tid:269162, ticks:2218622
rax:0xd932000 rbx:0xc rcx:0xffffffffffffffff rdx:0x10040 rsi:0x26000 rdi:0xd932000 rbp:0x681fffa0 rsp:0x681ffd60 r8:0xd901980 r9:0x7f5118719c50 r10:0xd90c000 r11:0x246 r12:0xd90c000 r13:0x26000 r14:0xd901980 r15:0x10040 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xc fs_base:0x7f511852d040 gs_base:0x0
  { map_file:"<ZERO>", addr:0xd90c000, length:0x26000, prot_flags:"rw-p", file_offset:0x0, device:0, inode:0, data_file:"", data_offset:0x0, file_size:0x26000 }
}
{
  real_time:24389.280839 global_time:959, event:`SYSCALLBUF_FLUSH' tid:269162, ticks:2761786
}
{
  real_time:24389.280841 global_time:960, event:`INSTRUCTION_TRAP' tid:269162, ticks:2761786
rax:0x7de3bc2f rbx:0x7ffe718b1da0 rcx:0x0 rdx:0x522d rsi:0x7ffe718b1da0 rdi:0xd706bc0 rbp:0xd706bc0 rsp:0x7ffe718b1d28 r8:0xd70b340 r9:0x0 r10:0x0 r11:0x27 r12:0x0 r13:0x7ffe718b1e00 r14:0x0 r15:0x0 rip:0xb8b83e2 eflags:0x10206 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xffffffffffffffff fs_base:0x7f511852d040 gs_base:0x0
}
{
  real_time:24389.280843 global_time:961, event:`SYSCALLBUF_RESET' tid:269162, ticks:2761786
}
{
  real_time:24389.280878 global_time:962, event:`SYSCALLBUF_FLUSH' tid:269162, ticks:2764932
}
{
  real_time:24389.280880 global_time:963, event:`INSTRUCTION_TRAP' tid:269162, ticks:2764932
rax:0x7de5f940 rbx:0x7ffe718b1da0 rcx:0x0 rdx:0x522d rsi:0x7ffe718b1da0 rdi:0xd706bc0 rbp:0xd706bc0 rsp:0x7ffe718b1d28 r8:0xd70b340 r9:0x0 r10:0x0 r11:0x27 r12:0x0 r13:0x7ffe718b1e00 r14:0x0 r15:0x0 rip:0xb8b83e2 eflags:0x10206 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xffffffffffffffff fs_base:0x7f511852d040 gs_base:0x0
}
{
  real_time:24389.280882 global_time:964, event:`SYSCALLBUF_RESET' tid:269162, ticks:2764932
}
{
  real_time:24389.280915 global_time:965, event:`SYSCALLBUF_FLUSH' tid:269162, ticks:2766409
}
{
  real_time:24389.280918 global_time:966, event:`INSTRUCTION_TRAP' tid:269162, ticks:2766409
rax:0x7de809f2 rbx:0x7ffe718b1da0 rcx:0x0 rdx:0x522d rsi:0x7ffe718b1da0 rdi:0xd706bc0 rbp:0xd706bc0 rsp:0x7ffe718b1d28 r8:0xd70b340 r9:0x0 r10:0x0 r11:0x27 r12:0x0 r13:0x7ffe718b1e00 r14:0x0 r15:0x0 rip:0xb8b83e2 eflags:0x10206 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xffffffffffffffff fs_base:0x7f511852d040 gs_base:0x0
}
{
  real_time:24389.280919 global_time:967, event:`SYSCALLBUF_RESET' tid:269162, ticks:2766409
}
{
  real_time:24389.280952 global_time:968, event:`SYSCALLBUF_FLUSH' tid:269162, ticks:2767886
}
{
  real_time:24389.280954 global_time:969, event:`INSTRUCTION_TRAP' tid:269162, ticks:2767886
rax:0x7dea22bc rbx:0x7ffe718b1da0 rcx:0x0 rdx:0x522d rsi:0x7ffe718b1da0 rdi:0xd706bc0 rbp:0xd706bc0 rsp:0x7ffe718b1d28 r8:0xd70b340 r9:0x0 r10:0x0 r11:0x27 r12:0x0 r13:0x7ffe718b1e00 r14:0x0 r15:0x0 rip:0xb8b83e2 eflags:0x10206 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xffffffffffffffff fs_base:0x7f511852d040 gs_base:0x0
}
{
  real_time:24389.280955 global_time:970, event:`SYSCALLBUF_RESET' tid:269162, ticks:2767886
}
{
  real_time:24389.281345 global_time:971, event:`SYSCALLBUF_FLUSH' tid:269162, ticks:2884875
}
{
  real_time:24389.281348 global_time:972, event:`INSTRUCTION_TRAP' tid:269162, ticks:2884875
rax:0x7e005b0d rbx:0x7ffe718b1b90 rcx:0x0 rdx:0x522d rsi:0x7ffe718b1b90 rdi:0xd706bc0 rbp:0xd706bc0 rsp:0x7ffe718b1b18 r8:0xd70b340 r9:0x0 r10:0xd667010 r11:0x27 r12:0x0 r13:0xd8de490 r14:0x7ffe718b1fbc r15:0xd820af0 rip:0xb8b83e2 eflags:0x10206 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xffffffffffffffff fs_base:0x7f511852d040 gs_base:0x0
}
{
  real_time:24389.281350 global_time:973, event:`SYSCALLBUF_RESET' tid:269162, ticks:2884875
}
{
  real_time:24389.281382 global_time:974, event:`SYSCALLBUF_FLUSH' tid:269162, ticks:2886335
}
{
  real_time:24389.281385 global_time:975, event:`INSTRUCTION_TRAP' tid:269162, ticks:2886335
rax:0x7e0275dd rbx:0x7ffe718b1b90 rcx:0x0 rdx:0x522d rsi:0x7ffe718b1b90 rdi:0xd707430 rbp:0xd707430 rsp:0x7ffe718b1b18 r8:0xd70b610 r9:0x0 r10:0x0 r11:0x27 r12:0x0 r13:0xd8ddbd0 r14:0x7ffe718b1fbc r15:0xd820af0 rip:0xb8b83e2 eflags:0x10206 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xffffffffffffffff fs_base:0x7f511852d040 gs_base:0x0
}
{
  real_time:24389.281386 global_time:976, event:`SYSCALLBUF_RESET' tid:269162, ticks:2886335
}
{
  real_time:24389.281420 global_time:977, event:`SYSCALLBUF_FLUSH' tid:269162, ticks:2889497
}
{
  real_time:24389.281422 global_time:978, event:`INSTRUCTION_TRAP' tid:269162, ticks:2889497
rax:0x7e04980c rbx:0x7ffe718b1b90 rcx:0x0 rdx:0x522d rsi:0x7ffe718b1b90 rdi:0xd707430 rbp:0xd707430 rsp:0x7ffe718b1b18 r8:0xd70b610 r9:0x0 r10:0x0 r11:0x27 r12:0x0 r13:0xd8ddbf0 r14:0x7ffe718b1fbc r15:0xd820af0 rip:0xb8b83e2 eflags:0x10206 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xffffffffffffffff fs_base:0x7f511852d040 gs_base:0x0
}
{
  real_time:24389.281424 global_time:979, event:`SYSCALLBUF_RESET' tid:269162, ticks:2889497
}
[FATAL ../../src/log.cc:402:emergency_debug()] (session doesn't look interactive, aborting emergency debugging)
=== Start rr backtrace:
rr(_ZN2rr13dump_rr_stackEv+0x5a)[0x5636c46a9a3a]
rr(_ZN2rr15notifying_abortEv+0x50)[0x5636c46ae700]
rr(+0x4c364)[0x5636c469f364]
rr(+0x15b676)[0x5636c47ae676]
rr(+0x15b82f)[0x5636c47ae82f]
rr(_ZN2rr13ReplaySession23check_ticks_consistencyEPNS_10ReplayTaskERKNS_5EventE+0x1e4)[0x5636c4729c64]
rr(_ZN2rr13ReplaySession28emulate_deterministic_signalEPNS_10ReplayTaskEiRKNS0_15StepConstraintsE+0x1f3)[0x5636c472db53]
rr(_ZN2rr13ReplaySession18try_one_trace_stepEPNS_10ReplayTaskERKNS0_15StepConstraintsE+0x1a4)[0x5636c47238b4]
rr(_ZN2rr13ReplaySession11replay_stepERKNS0_15StepConstraintsE+0x10e)[0x5636c472754e]
rr(_ZN2rr14ReplayTimeline19replay_step_forwardENS_10RunCommandEl+0xbf)[0x5636c470585f]
rr(_ZN2rr9GdbServer14debug_one_stepERNS_10GdbRequestE+0x378)[0x5636c47c49a8]
rr(_ZN2rr9GdbServer12serve_replayERKNS0_15ConnectionFlagsE+0x64b)[0x5636c47c5cfb]
rr(+0xdee8a)[0x5636c4731e8a]
rr(_ZN2rr13ReplayCommand3runERSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EE+0x56a)[0x5636c4732a6a]
rr(main+0x31a)[0x5636c469823a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f2e076d50b3]
rr(_start+0x2e)[0x5636c469838e]
=== End rr backtrace

khuey commented 3 years ago

Is the instruction at 0xb8b83e2 an RDRAND by chance?

seelabs commented 3 years ago

@khuey It appears to be a shift left from code in openssl. Although the instruction before it is rdtsc which is "read timestamp counter". Maybe that's the issue. Here's the code around that instruction if it helps:

0xb8b83c0 <OPENSSL_atomic_add>                  mov    (%rdi),%eax                                                                                                                                                                                                                                │
│   0xb8b83c2 <OPENSSL_atomic_add+2>                lea    (%rsi,%rax,1),%r8                                                                                                                                                                                                                          │
│   0xb8b83c6 <OPENSSL_atomic_add+6>                lock cmpxchg %r8d,(%rdi)                                                                                                                                                                                                                          │
│   0xb8b83cb <OPENSSL_atomic_add+11>               jne    0xb8b83c2 <OPENSSL_atomic_add+2>                                                                                                                                                                                                           │
│   0xb8b83cd <OPENSSL_atomic_add+13>               mov    %r8d,%eax                                                                                                                                                                                                                                  │
│   0xb8b83d0 <OPENSSL_atomic_add+16>               cltq                                                                                                                                                                                                                                              │
│   0xb8b83d2 <OPENSSL_atomic_add+18>               repz ret                                                                                                                                                                                                                                          │
│   0xb8b83d4                                           data16 nopw %cs:0x0(%rax,%rax,1)                                                                                                                                                                                                              │
│   0xb8b83df                                           nop                                                                                                                                                                                                                                           │
│   0xb8b83e0 <OPENSSL_rdtsc>                       rdtsc                                                                                                                                                                                                                                             │
│B+>0xb8b83e2 <OPENSSL_rdtsc+2>                     shl    $0x20,%rdx                                                                                                                                                                                                                                 │
│   0xb8b83e6 <OPENSSL_rdtsc+6>                     or     %rdx,%rax                                                                                                                                                                                                                                  │
│   0xb8b83e9 <OPENSSL_rdtsc+9>                     repz ret                                                                                                                                                                                                                                          │
│   0xb8b83eb                                           nopl   0x0(%rax,%rax,1)                                                                                                                                                                                                                       │
│   0xb8b83f0 <OPENSSL_ia32_cpuid>                  mov    %rbx,%r8                                                                                                                                                                                                                                   │
│   0xb8b83f3 <OPENSSL_ia32_cpuid+3>                xor    %eax,%eax                                                                                                                                                                                                                                  │
│   0xb8b83f5 <OPENSSL_ia32_cpuid+5>                mov    %rax,0x8(%rdi)                                                                                                                                                                                                                             │
│   0xb8b83f9 <OPENSSL_ia32_cpuid+9>                cpuid                                                                                                                                                                                                                                             │
│   0xb8b83fb <OPENSSL_ia32_cpuid+11>               mov    %eax,%r11d                                                                                                                                                                                                                                 │
│   0xb8b83fe <OPENSSL_ia32_cpuid+14>               xor    %eax,%eax                                                                                                                                                                                                                                  │
│   0xb8b8400 <OPENSSL_ia32_cpuid+16>               cmp    $0x756e6547,%ebx

khuey commented 3 years ago

Ah, ok, those instruction trap events are from the rdtsc then. That should be fine.

Unfortunately debugging these kinds of issues is pretty tough. If you move a recording made on the Intel machine to the AMD machine does it replay? How about the reverse?

seelabs commented 3 years ago

@khuey No, I am not able to replay a recording made on the Intel machine and moved to the AMD machine. I get the error:

[FATAL ../../src/ReplaySession.cc:187:ReplaySession()] Trace was recorded with CPUID faulting enabled, but this
system does not support CPUID faulting.

I am also unable to replay a recoding made on the AMD machine on the Intel machine, but this is due to different software being installed. I get the error:

[FATAL ../src/TraceStream.cc:1030:read_mapped_region() errno: ENOENT] Failed to stat /usr/lib/x86_64-linux-gnu/ld-2.31.so: replay is impossible

khuey commented 3 years ago

Is the software you're recording here something you can share a trace of? Or is it proprietary/private/whatever?

seelabs commented 3 years ago

@khuey I'm working on the xrp ledger (https://github.com/ripple/rippled). I'm happy to share traces. What would you like? Just the data and events files or the whole directory with those files? Unfortunately, that directory is almost a gig, but I'm happy to upload it if it'll help.

I don't expect you to build it yourself, but if you'd like to reproduce it yourself: I was on the develop branch. The build instructions are here: https://xrpl.org/build-run-rippled-ubuntu.html. I'm more than happy to help you build it if you run into issues. Recording and replaying the PayStrand unit test should reproduce the issue (rippled --unittest=PayStrand).

khuey commented 3 years ago

Given that the problem is specific to a CPU that I don't have, I'm not going to be able to reproduce it myself.

If you could produce a failing trace with rr record , then rr pack it, and upload it somewhere or email it to me that would be helpful.

seelabs commented 3 years ago

@khuey I ran rr pack tar'd up the directory and uploaded it to this repo: https://github.com/seelabs/rr_traces. I had to split the files to get around github's file limit, so you need to recombine them with cat (see the README file).

khuey commented 3 years ago

Thanks. Unfortunately I can't run that trace locally because it uses the new fancy SHA256 instructions and my CPU is too old for that (CPUs with this capability also seem to be new enough that they haven't made it into the inventory of cloud vendors like AWS).

I was, however, able to examine the binaries. Since this is AMD specific and your program does cryptography I strongly suspect the problem here is an rdrand instruction is being executed. On Intel CPUs we can intercept the CPUID instruction and mask off the bits for rdrand, There's no equivalent capability on AMD hardware, and we've run into this problem before (e.g. #2766). I see two functions in the rippled binary that use rdrand. One is OPENSSL_ia32_rdrand_bytes, the other is std::(anonymous namespace)::__x86_rdrand(void*). We've had a hack to disable OpenSSL's rdrand use for a long time (https://github.com/rr-debugger/rr/blob/d3b38fc768a45fc119f635b099b322960d5c449a/src/RecordSession.cc#L2214) so I suspect the standard library is the culprit here. If you can replay this trace with a breakpoint on __x86_rdrand and hit that breakpoint, then that is the problem.

rippled does make use of std::random_device. You might try changing the constructors in rippled to pass "/dev/urandom" as an argument (instead of relying on the default constructor) which will make libstdc++ use /dev/urandom instead of rdrand, which will be fine for rr.

Unfortunately there's not a ton we can do here from the rr side. Even if we got a patch into libstdc++ to allow disabling the use of rdrand via an environment variable or something it would take a long time for it to reach users.

seelabs commented 3 years ago

I set a breakpoint on __x86_rdrand and reran. It hits the assertion before it hits the breakpoint. Still, your diagnosis of rdrand being the culprit is a good theory. With that hint I'll spend some time on my side seeing if I can resolve this. Thanks for all the effort you put into this. Much appreciated!

khuey commented 3 years ago

Hrm, ok. If you could generate a trace without the SHA256 instructions somehow I can take another look.

seelabs commented 3 years ago

@khuey Yeah, I should be able to do that.

seelabs commented 3 years ago

Hmm, I'm not building with any -march flags. I wonder if a library (maybe OpenSSL) is dynamically querying for the cpu type and using SHA256 from that. Let me see what I can find about that.

Edit: or more likely the OpenSSL library was compiled with -march. I'll see about re-compiling that library.

seelabs commented 3 years ago

So far I'm not sure how to get it to compile without the SHA256 asm instructions. I tried adding a global -DOPENSSL_NO_ASM but when I look as the disassembled executable, the instructions are still there. I'll try again later.

On interesting data point: when I re-compiled with clang, rr ran no problems. When I compiled with gcc again, I hit the assertion.

Artem-B commented 1 year ago

I've just hit it rr 5.6.0-2 on Debian, also running on AMD CPU.

Is recompiling from sources w/ clang still the only fix or workaround?

khuey commented 1 year ago

You hit it on which CPU?

Artem-B commented 1 year ago

AMD Threadripper Pro 3995WX. Recompiling rr from sources w/ clang appears to work.

GitMensch commented 10 months ago

@seelabs Am I right that your workaround was to recompile openssl with clang, which presumably did not use the sHA256 instruction [and may do now with an updated version]?

[...] I am not able to replay a recording made on the Intel machine and moved to the AMD machine. I get the error:
[FATAL ../../src/ReplaySession.cc:187:ReplaySession()] Trace was recorded with CPUID faulting enabled, but this
system does not support CPUID faulting.
I am also unable to replay a recoding made on the AMD machine on the Intel machine, but this is due to different software being installed. I get the error:
[FATAL ../src/TraceStream.cc:1030:read_mapped_region() errno: ENOENT] Failed to stat /usr/lib/x86_64-linux-gnu/ld-2.31.so: replay is impossible

Both directions may work if you do, with a current rr:

# on your recording machine
$> rr record --disable-cpuid-faulting yourprog
$> rr pack

# then transfer the recording and on the target machine do
$> rr replay --disable-cpuid-faulting yourtracedir

In any case: Is the original issue solved "by just updating rr" or is this still happening (when using system-default OpenSSL)?

rr-debugger / rr

Assertion `ticks_now == trace_ticks' failed to hold. ticks mismatch for 'INSTRUCTION_TRAP'; on AMD CPU #2862