rr-debugger / rr

Record and Replay Framework
http://rr-project.org/
Other
9.18k stars 585 forks source link

Ryzen 3900X test failures #2677

Open Manishearth opened 4 years ago

Manishearth commented 4 years ago

Testing rr on Ryzen 3900X (after running the ryzen workaround) and I get the following failures:

The following tests FAILED:
    565 - setuid-no-syscallbuf (Failed)
    1102 - checksum_sanity_noclone (Failed)

The following tests FAILED:
    1153 - record_replay-no-syscallbuf (Failed)
    2394 - record_replay-32 (Failed)

I can reproduce setuid-no-syscallbuf with ctest -R sometimes, I cannot reproduce the other failures.

I'm not sure if 3900X should be added to the list of supported Ryzen CPUs, are these known bits of flakiness?

cc @glandium

glandium commented 4 years ago

Can you check what the .err files in /tmp/rr-test-* say?

rocallahan commented 4 years ago

I can reproduce setuid-no-syscallbuf with ctest -R sometimes, I cannot reproduce the other failures.

You mean the setuid-no-syscallbuf test fails often, but the other tests fail very rarely?

v-lopez commented 4 years ago

Ryzen 3700x After following the setup instructions, I get 12 failures out of 2487 tests. Which is still great compared to before.

Summary

     82 - clone_vfork_pidfd (Failed)
     83 - clone_vfork_pidfd-no-syscallbuf (Failed)
    920 - nested_detach_wait (Failed)
    921 - nested_detach_wait-no-syscallbuf (Failed)
    1140 - nested_detach (Failed)
    1141 - nested_detach-no-syscallbuf (Failed)
    1326 - clone_vfork_pidfd-32 (Failed)
    1327 - clone_vfork_pidfd-32-no-syscallbuf (Failed)
    2162 - nested_detach_wait-32 (Failed)
    2163 - nested_detach_wait-32-no-syscallbuf (Failed)
    2382 - nested_detach-32 (Failed)
    2383 - nested_detach-32-no-syscallbuf (Failed)

full output.txt

These are the .err files of all the tests, I can provide the rest of files, but the tar would be too big to provide all at once. rr-tests.tar.gz

glandium commented 4 years ago

@v-lopez: Please file a separate issue for those. The clone_vfork_pidfd has a similar problem to what was fixed in 17aa8239c0a9ffd0e66623fc3627f664b384bf1e, and nested-detach has a different kind of assertion.

glandium commented 4 years ago

@Manishearth did setuid fail with something like the following?

[FATAL .../rr/src/Registers.cc:405:compare_register_files()] 
 (task 911147 (rec:857972) at time 365)
 -> Assertion `!bail_error || match' failed to hold. Fatal register mismatch (ticks/rec:128273/128273)
pnkfelix commented 4 years ago

On my end, with a 3990X, the setuid-no-syscallbuf test is failing (and that's the main one that I think I've seen fail with some repeated runs, though sometimes it doesn't fail), and the record.err says this:

[ERROR /home/pnkfelix/Dev/Mozilla/rr.git/src/Registers.cc:295:maybe_print_reg_mismatch()] r10 0x55a993c2e95a != 0x55a993c2e958 (replaying vs. recorded)
process 317197 sent SIGURG
For full log, click here ``` % cat /tmp/rr-test-setuid-TgOCW9j3p/replay.err [ERROR /home/pnkfelix/Dev/Mozilla/rr.git/src/Registers.cc:295:maybe_print_reg_mismatch()] r10 0x55a993c2e95a != 0x55a993c2e958 (replaying vs. recorded) process 317197 sent SIGURG ====== /proc/317197/status Name: rr Umask: 0002 State: S (sleeping) Tgid: 317197 Ngid: 0 Pid: 317197 PPid: 317196 TracerPid: 0 Uid: 1000 1000 1000 1000 Gid: 1000 1000 1000 1000 FDSize: 64 Groups: 4 27 1000 NStgid: 317197 NSpid: 317197 NSpgid: 317197 NSsid: 4178 VmPeak: 16508 kB VmSize: 15468 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 10088 kB VmRSS: 9680 kB RssAnon: 1116 kB RssFile: 8564 kB RssShmem: 0 kB VmData: 1156 kB VmStk: 136 kB VmExe: 5728 kB VmLib: 1320 kB VmPTE: 68 kB VmSwap: 0 kB HugetlbPages: 0 kB CoreDumping: 0 THP_enabled: 1 Threads: 1 SigQ: 1/1030150 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 0000000000000000 SigCgt: 0000000180002000 CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: 0000003fffffffff CapAmb: 0000000000000000 NoNewPrivs: 0 Seccomp: 0 Speculation_Store_Bypass: thread vulnerable Cpus_allowed: 00000100,00000000,00000000,00000000 Cpus_allowed_list: 104 Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001 Mems_allowed_list: 0 voluntary_ctxt_switches: 2 nonvoluntary_ctxt_switches: 335 ====== /proc/317197/stack ====== /proc/317198/status Name: rr:setuid-TgOCW Umask: 0002 State: t (tracing stop) Tgid: 317198 Ngid: 0 Pid: 317198 PPid: 317197 TracerPid: 317197 Uid: 1000 1000 1000 1000 Gid: 1000 1000 1000 1000 FDSize: 1024 Groups: 4 27 1000 NStgid: 317198 NSpid: 317198 NSpgid: 317198 NSsid: 317198 VmPeak: 5212 kB VmSize: 5088 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 2184 kB VmRSS: 2184 kB RssAnon: 380 kB RssFile: 1804 kB RssShmem: 0 kB VmData: 2460 kB VmStk: 0 kB VmExe: 8 kB VmLib: 1920 kB VmPTE: 56 kB VmSwap: 0 kB HugetlbPages: 0 kB CoreDumping: 0 THP_enabled: 1 Threads: 1 SigQ: 1/1030150 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 0000000000010000 SigCgt: 0000000000000000 CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: 0000003fffffffff CapAmb: 0000000000000000 NoNewPrivs: 1 Seccomp: 0 Speculation_Store_Bypass: thread vulnerable Cpus_allowed: 00000100,00000000,00000000,00000000 Cpus_allowed_list: 104 Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001 Mems_allowed_list: 0 voluntary_ctxt_switches: 332 nonvoluntary_ctxt_switches: 0 ====== /proc/317198/stack ====== gdb -p 317197 -ex 'set confirm off' -ex 'set height 0' -ex 'thread apply all bt' -ex q &1 GNU gdb (Ubuntu 9.1-0ubuntu1) 9.1 Copyright (C) 2020 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: . Find the GDB manual and other documentation resources online at: . For help, type "help". Type "apropos word" to search for commands related to "word". Attaching to process 317197 Could not attach to process. If your uid matches the uid of the target process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf ptrace: Operation not permitted. ```
khuey commented 4 years ago

I would expect that to be a duplicate of #2694. If you can pack and upload a trace I can verify whether or not the tracee is using RDRAND.

pnkfelix commented 4 years ago

I assume the trace you want packed is the one in the same /tmp/rr-test-setuid-XXX directory; I've put a tarball of that whole directory below.

rr-test-setuid.tar.gz (this wasn't what you asked for; see below.)

pnkfelix commented 4 years ago

Oh, I'm sorry, you asked me to pack it, and I didn't realized that meant run rr pack on it as described in #2694. I'll do that now.

pnkfelix commented 4 years ago

Okay this tar ball has the packed version of the directory.

rr-test-setuid.tar.gz

khuey commented 4 years ago

Unsupported instruction at 0x7f534449603f (opcode rdrand)

Can you replay the trace, hbreak *0x7f534449603f in gdb. continue, and get a backtrace at that instruction?

pnkfelix commented 4 years ago
Click for Backtrace ``` % ./bin/rr replay /tmp/rr-test-setuid-TgOCW9j3p/latest-trace/ On Zen CPUs, rr will not work reliably unless you disable the hardware SpecLockMap optimization. For instructions on how to do this, see https://github.com/mozilla/rr/wiki/Zen GNU gdb (Ubuntu 9.1-0ubuntu1) 9.1 Copyright (C) 2020 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: . Find the GDB manual and other documentation resources online at: . For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /tmp/rr-test-setuid-TgOCW9j3p/setuid-TgOCW9j3p-0/mmap_pack_5_setuid-TgOCW9j3p... Really redefine built-in command "restart"? (y or n) [answered Y; input not from terminal] Remote debugging using 127.0.0.1:50382 Reading symbols from /lib64/ld-linux-x86-64.so.2... (No debugging symbols found in /lib64/ld-linux-x86-64.so.2) 0x00007f534470d100 in ?? () from /lib64/ld-linux-x86-64.so.2 (rr) hbreak *0x7f534449603f Hardware assisted breakpoint 1 at 0x7f534449603f (rr) continue Continuing. Breakpoint 1, 0x00007f534449603f in ?? () from /lib/x86_64-linux-gnu/libnss_systemd.so.2 (rr) bt #0 0x00007f534449603f in ?? () from /lib/x86_64-linux-gnu/libnss_systemd.so.2 #1 0x00007f5344496273 in ?? () from /lib/x86_64-linux-gnu/libnss_systemd.so.2 #2 0x00007f5344496541 in ?? () from /lib/x86_64-linux-gnu/libnss_systemd.so.2 #3 0x00007f5344484b11 in ?? () from /lib/x86_64-linux-gnu/libnss_systemd.so.2 #4 0x00007f534448ab1e in ?? () from /lib/x86_64-linux-gnu/libnss_systemd.so.2 #5 0x00007f534448b251 in ?? () from /lib/x86_64-linux-gnu/libnss_systemd.so.2 #6 0x00007f5344498981 in _nss_systemd_getgrnam_r () from /lib/x86_64-linux-gnu/libnss_systemd.so.2 #7 0x00007f53445a967d in __getgrnam_r (name=name@entry=0x55a992626030 "nobody", resbuf=resbuf@entry=0x7f53446b5020 , buffer=0x55a993c23fc0 "", buflen=buflen@entry=1024, result=result@entry=0x7fff06e2c640) at ../nss/getXXbyYY_r.c:315 #8 0x00007f53445a892c in getgrnam (name=0x55a992626030 "nobody") at ../nss/getXXbyYY.c:134 #9 0x000055a99262557e in main (argc=1, argv=0x7fff06e2c7d8) at /home/pnkfelix/Dev/Mozilla/rr.git/src/test/setuid.c:15 (rr) ```
pnkfelix commented 4 years ago

So this confirms that my problem is a duplicate of issue #2694, since __getgrnam_r appears in the backtrace, right?

khuey commented 4 years ago

Yup, it's the same thing in systemd (which is fixed upstream at systemd/systemd#17115)

GitMensch commented 3 years ago

As this is identified as both an upstream issue (systemd) and duplicate, can we close this?