rr-debugger / rr

Record and Replay Framework
http://rr-project.org/
Other
9.04k stars 573 forks source link

Trace divergence on recording of Firefox (Zen4?) #3778

Open mgaudet opened 1 month ago

mgaudet commented 1 month ago

rr built from 77f88f4535c5c99a15c6b392fd70209babbcb066.

Trace was happily humming away before failing with Assertion 'ticks_now == trace_ticks' failed to hold. ticks mismatch for 'SYSCALL: epoll_wait'; expected 77360501, got 77360395

Tail of trace dump ``` [FATAL src/ReplaySession.cc:1239:check_ticks_consistency()] (task 995006 (rec:983756) at time 1906187) -> Assertion `ticks_now == trace_ticks' failed to hold. ticks mismatch for 'SYSCALL: epoll_wait'; expected 77360501, got 77360395 Tail of trace dump: { real_time:41871.222892 global_time:1906167, event:`SYSCALLBUF_RESET' tid:985150, ticks:813092284 } { real_time:41871.222909 global_time:1906168, event:`SYSCALL: futex' (state:EXITING_SYSCALL) tid:983743, ticks:20576026562 rax:0x0 rbx:0x681fffa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x189 rdi:0x7676a0958644 rbp:0x7fffe89abe70 rsp:0x681ffe00 r8:0x0 r9:0xffffffff r10:0x0 r11:0x246 r12:0x0 r13:0x0 r14:0x7676a0958500 r15:0x7676a0958644 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xca fs_base:0x7676b74fe0c0 gs_base:0x0 } { real_time:41871.223927 global_time:1906169, event:`SYSCALL: restart_syscall' (state:EXITING_SYSCALL) tid:983769, ticks:1129484 rax:0xfffffffffffffe00 rbx:0x7676993fffa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x189 rdi:0x76769fd4ea30 rbp:0x76769e480760 rsp:0x7676993ffe00 r8:0x0 r9:0xffffffff r10:0x0 r11:0x246 r12:0x0 r13:0x0 r14:0x76769fd4e9a0 r15:0x76769fd4ea30 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xca fs_base:0x76769e4816c0 gs_base:0x0 } { real_time:41871.224018 global_time:1906170, event:`SIGNAL: SIGCHLD(async)' tid:983769, ticks:1129484 rax:0xfffffffffffffe00 rbx:0x7676993fffa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x189 rdi:0x76769fd4ea30 rbp:0x76769e480760 rsp:0x7676993ffe00 r8:0x0 r9:0xffffffff r10:0x0 r11:0x246 r12:0x0 r13:0x0 r14:0x76769fd4e9a0 r15:0x76769fd4ea30 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xca fs_base:0x76769e4816c0 gs_base:0x0 st0:0x0 st1:0x0 st2:0x0 st3:0x0 st4:0x0 st5:0x0 st6:0x403ade0b6b3a76400012 st7:0x403ade0b6b3a76400000 ymm0:0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa ymm1:0xffffffffffffffffffffffffffffffff ymm2:0x7676a093a16000007676a093a340 ymm3:0x10000000000007676a0908fe0 ymm4:0x7676a0908fe000007676a093a160 ymm5:0x7676a09e7d7000007676a0904b10 ymm6:0x7676a09049d00000000000000000 ymm7:0x7676a093a16000007676a0908fb0 ymm8:0x7474616d2f656d6f682f3a6e69622f70 ymm9:0xff000000ffff00000000000000 ymm10:0x0 ymm11:0x99999999999999999999999999999999 ymm12:0x20202020202020202020202020202020 ymm13:0x7676b2e3bbcb00007676b2e3bbf1 ymm14:0x7676b2e3bc1800007676b2e3bba8 ymm15:0x0 } { real_time:41871.224099 global_time:1906171, event:`SIGNAL_HANDLER: SIGCHLD(async)' tid:983769, ticks:1129484 rax:0x0 rbx:0x7676993fffa0 rcx:0xffffffffffffffff rdx:0x7676993ff200 rsi:0x7676993ff330 rdi:0x11 rbp:0x76769e480760 rsp:0x7676993ff1f8 r8:0x0 r9:0xffffffff r10:0x0 r11:0x246 r12:0x0 r13:0x0 r14:0x76769fd4e9a0 r15:0x76769fd4ea30 rip:0x7676a7c53480 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xca fs_base:0x76769e4816c0 gs_base:0x0 st0:0x0 st1:0x0 st2:0x0 st3:0x0 st4:0x0 st5:0x0 st6:0x0 st7:0x0 ymm0:0x0 ymm1:0x0 ymm2:0x0 ymm3:0x0 ymm4:0x0 ymm5:0x0 ymm6:0x0 ymm7:0x0 ymm8:0x0 ymm9:0x0 ymm10:0x0 ymm11:0x0 ymm12:0x0 ymm13:0x0 ymm14:0x0 ymm15:0x0 { tid:983769, addr:0x7676993ff1f8, length:0xe88 } } { real_time:41871.224208 global_time:1906172, event:`SYSCALLBUF_FLUSH' tid:983769, ticks:1129522 { syscall:'write', ret:0x1, size:0x10, desched:1 } } { real_time:41871.224218 global_time:1906173, event:`SYSCALL: rt_sigreturn' (state:ENTERING_SYSCALL) tid:983769, ticks:1129522 rax:0xffffffffffffffda rbx:0x7676993fffa0 rcx:0xffffffffffffffff rdx:0x1 rsi:0x7676993ff1d7 rdi:0x2 rbp:0x76769e480760 rsp:0x7676993ff200 r8:0x2 r9:0xffffffff r10:0x0 r11:0x246 r12:0x0 r13:0x0 r14:0x76769fd4e9a0 r15:0x76769fd4ea30 rip:0x7676b6e45329 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xf fs_base:0x76769e4816c0 gs_base:0x0 } { real_time:41871.224227 global_time:1906174, event:`SYSCALLBUF_RESET' tid:983769, ticks:1129522 } { real_time:41871.224259 global_time:1906175, event:`SYSCALL: rt_sigreturn' (state:EXITING_SYSCALL) tid:983769, ticks:1129522 rax:0xca rbx:0x7676993fffa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x189 rdi:0x76769fd4ea30 rbp:0x76769e480760 rsp:0x7676993ffe00 r8:0x0 r9:0xffffffff r10:0x0 r11:0x246 r12:0x0 r13:0x0 r14:0x76769fd4e9a0 r15:0x76769fd4ea30 rip:0x70000000 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xffffffffffffffff fs_base:0x76769e4816c0 gs_base:0x0 st0:0x0 st1:0x0 st2:0x0 st3:0x0 st4:0x0 st5:0x0 st6:0x403ade0b6b3a76400012 st7:0x403ade0b6b3a76400000 ymm0:0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa ymm1:0xffffffffffffffffffffffffffffffff ymm2:0x7676a093a16000007676a093a340 ymm3:0x10000000000007676a0908fe0 ymm4:0x7676a0908fe000007676a093a160 ymm5:0x7676a09e7d7000007676a0904b10 ymm6:0x7676a09049d00000000000000000 ymm7:0x7676a093a16000007676a0908fb0 ymm8:0x7474616d2f656d6f682f3a6e69622f70 ymm9:0xff000000ffff00000000000000 ymm10:0x0 ymm11:0x99999999999999999999999999999999 ymm12:0x20202020202020202020202020202020 ymm13:0x7676b2e3bbcb00007676b2e3bbf1 ymm14:0x7676b2e3bc1800007676b2e3bba8 ymm15:0x0 } { real_time:41871.224328 global_time:1906176, event:`SYSCALL: restart_syscall' (state:ENTERING_SYSCALL) tid:983769, ticks:1129522 rax:0xffffffffffffffda rbx:0x7676993fffa0 rcx:0xffffffffffffffff rdx:0x0 rsi:0x189 rdi:0x76769fd4ea30 rbp:0x76769e480760 rsp:0x7676993ffe00 r8:0x0 r9:0xffffffff r10:0x0 r11:0x246 r12:0x0 r13:0x0 r14:0x76769fd4e9a0 r15:0x76769fd4ea30 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xca fs_base:0x76769e4816c0 gs_base:0x0 } { real_time:41871.224446 global_time:1906177, event:`SYSCALL: epoll_wait' (state:EXITING_SYSCALL) tid:983756, ticks:77359468 rax:0x1 rbx:0x76769c5fffa0 rcx:0xffffffffffffffff rdx:0x20 rsi:0x7676a1584200 rdi:0x15 rbp:0x0 rsp:0x76769c5ffe00 r8:0x7676a08f49f0 r9:0x7676a08f49f0 r10:0xffffffff r11:0x246 r12:0xffffffff r13:0x70904098 r14:0x709040a8 r15:0x70904228 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xe8 fs_base:0x7676a08f56c0 gs_base:0x0 { tid:983756, addr:0x7676a1584200, length:0xc } } { real_time:41871.224558 global_time:1906178, event:`SYSCALLBUF_FLUSH' tid:983756, ticks:77359854 { syscall:'clock_gettime', ret:0x0, size:0x20 } { syscall:'read', ret:0x1, size:0x11, desched:1 } { syscall:'read', ret:0xfffffffffffffff5, size:0x10, desched:1 } } { real_time:41871.224578 global_time:1906179, event:`SYSCALL: waitid' (state:ENTERING_SYSCALL) tid:983756, ticks:77359854 rax:0xffffffffffffffda rbx:0x76769c5fffa0 rcx:0xffffffffffffffff rdx:0x7676a08f4780 rsi:0xf0386 rdi:0x1 rbp:0x7676a08f46a0 rsp:0x76769c5ffe00 r8:0x0 r9:0x0 r10:0x1000005 r11:0x246 r12:0x7676b6db4e30 r13:0x1 r14:0x1000005 r15:0x7676a08f4780 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xf7 fs_base:0x7676a08f56c0 gs_base:0x0 } { real_time:41871.224587 global_time:1906180, event:`SYSCALLBUF_RESET' tid:983756, ticks:77359854 } { real_time:41871.224633 global_time:1906181, event:`SYSCALL: waitid' (state:EXITING_SYSCALL) tid:983756, ticks:77359854 rax:0x0 rbx:0x76769c5fffa0 rcx:0xffffffffffffffff rdx:0x7676a08f4780 rsi:0xf0386 rdi:0x1 rbp:0x7676a08f46a0 rsp:0x76769c5ffe00 r8:0x0 r9:0x0 r10:0x1000005 r11:0x246 r12:0x7676b6db4e30 r13:0x1 r14:0x1000005 r15:0x7676a08f4780 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xf7 fs_base:0x7676a08f56c0 gs_base:0x0 { tid:983756, addr:0x7676a08f4780, length:0x80 } } { real_time:41871.224734 global_time:1906182, event:`SYSCALL: waitid' (state:ENTERING_SYSCALL) tid:983756, ticks:77359880 rax:0xffffffffffffffda rbx:0x76769c5fffa0 rcx:0xffffffffffffffff rdx:0x7676a08f4780 rsi:0xf0386 rdi:0x1 rbp:0x7676a08f46a0 rsp:0x76769c5ffe00 r8:0x0 r9:0x0 r10:0x5 r11:0x246 r12:0x7676b6db4e30 r13:0x1 r14:0x7676a08f4780 r15:0x1 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xf7 fs_base:0x7676a08f56c0 gs_base:0x0 } { real_time:41871.224965 global_time:1906183, event:`SYSCALL: waitid' (state:EXITING_SYSCALL) tid:983756, ticks:77359880 rax:0x0 rbx:0x76769c5fffa0 rcx:0xffffffffffffffff rdx:0x7676a08f4780 rsi:0xf0386 rdi:0x1 rbp:0x7676a08f46a0 rsp:0x76769c5ffe00 r8:0x0 r9:0x0 r10:0x5 r11:0x246 r12:0x7676b6db4e30 r13:0x1 r14:0x7676a08f4780 r15:0x1 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xf7 fs_base:0x7676a08f56c0 gs_base:0x0 { tid:983756, addr:0x7676a08f4780, length:0x80 } } { real_time:41871.225075 global_time:1906184, event:`SYSCALL: rt_sigaction' (state:ENTERING_SYSCALL) tid:983756, ticks:77359923 rax:0xffffffffffffffda rbx:0x11 rcx:0xffffffffffffffff rdx:0x0 rsi:0x7676a08f4680 rdi:0x11 rbp:0x7676a08f47c0 rsp:0x7676a08f4680 r8:0x0 r9:0x0 r10:0x8 r11:0x246 r12:0x0 r13:0x1 r14:0x76766d72c980 r15:0x0 rip:0x7676b6e4540a eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xd fs_base:0x7676a08f56c0 gs_base:0x0 } { real_time:41871.225115 global_time:1906185, event:`SYSCALL: rt_sigaction' (state:EXITING_SYSCALL) tid:983756, ticks:77359923 rax:0x0 rbx:0x11 rcx:0xffffffffffffffff rdx:0x0 rsi:0x7676a08f4680 rdi:0x11 rbp:0x7676a08f47c0 rsp:0x7676a08f4680 r8:0x0 r9:0x0 r10:0x8 r11:0x246 r12:0x0 r13:0x1 r14:0x76766d72c980 r15:0x0 rip:0x7676b6e4540a eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xd fs_base:0x7676a08f56c0 gs_base:0x0 } { real_time:41871.225215 global_time:1906186, event:`SYSCALLBUF_FLUSH' tid:983756, ticks:77360501 { syscall:'epoll_wait', ret:0x0, size:0x10 } } { real_time:41871.225226 global_time:1906187, event:`SYSCALL: epoll_wait' (state:ENTERING_SYSCALL) tid:983756, ticks:77360501 rax:0xffffffffffffffda rbx:0x76769c5fffa0 rcx:0xffffffffffffffff rdx:0x20 rsi:0x7676a1584200 rdi:0x15 rbp:0x0 rsp:0x76769c5ffe00 r8:0x2 r9:0x7676a08f49f0 r10:0xffffffff r11:0x246 r12:0xffffffff r13:0x70904020 r14:0x70904030 r15:0x709041b0 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xe8 fs_base:0x7676a08f56c0 gs_base:0x0 } { real_time:41871.225236 global_time:1906188, event:`SYSCALLBUF_RESET' tid:983756, ticks:77360501 } ```

Discovered after a Pernosco trace failed to process (mozilla/user/mgaudet@mozilla.com/1720541975/FIaAP0DAj3A)

Machine is a AMD Ryzen Threadripper PRO 7975WX 32-Cores. Zen workaround was set using the python script.

mgaudet commented 1 month ago

(Re-running rr replay -p 983862 -a also fails with same assert and same tick failures)

mgaudet commented 1 month ago

weird. I managed to complete an interactive replay tho.

khuey commented 1 month ago

Yeah we see occasional divergences on Pernosco with traces uploaded from Zen 4 users. There's some unexplained issue here.

mgaudet commented 1 month ago

Yeah, I wasn't sure this would be a helpful report -- but thought I should have it on file. Let me know if you want anything from me!