rr-debugger / rr

Record and Replay Framework
http://rr-project.org/
Other
9.17k stars 588 forks source link

vfork_done_clone fails on A1 #3595

Closed GitMensch closed 1 year ago

GitMensch commented 1 year ago

environment:

$ gcc --version
gcc (GCC) 11.3.1 20221121 (Red Hat 11.3.1-4.3.0.1)
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ cat /etc/oracle-release
Oracle Linux Server release 9.2
$ cat /etc/redhat-release
Red Hat Enterprise Linux release 9.2 (Plow)
$ uname -a
Linux instance-20230828-ampere-ora9 5.15.0-103.114.4.el9uek.aarch64 #2 SMP Mon Jun 26 10:21:19 PDT 2023 aarch64 aarch64 aarch64 GNU/Linux
$ cat /proc/cpuinfo
processor       : 0
BogoMIPS        : 50.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x3
CPU part        : 0xd0c
CPU revision    : 1

processor       : 1
BogoMIPS        : 50.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x3
CPU part        : 0xd0c
CPU revision    : 1

building worked fine (libcapnp needs a local build on this machine; for details how to get this machine and how the build is setup see #3436); failed tests:

$ ctest -R 'vfork_done_clone' -VV
UpdateCTestConfiguration  from :/home/opc/rr/build/DartConfiguration.tcl
UpdateCTestConfiguration  from :/home/opc/rr/build/DartConfiguration.tcl
Test project /home/opc/rr/build
Constructing a list of tests
Done constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
test 1350
    Start 1350: vfork_done_clone

1350: Test command: /usr/bin/bash "source_dir/src/test/vfork_done_clone.run" "vfork_done_clone" "" "bin_dir" "120"
1350: Test timeout computed to be: 10000000
1350: source_dir/src/test/util.sh: line 279: 460925 Aborted                 (core dumped) _RR_TRACE_DIR="$workdir" test-monitor $TIMEOUT record.err $RR_EXE $GLOBAL_OPTIONS record $LIB_ARG $RECORD_ARGS "$exe" $exeargs > record.out 2> record.err
1350: Test 'vfork_done_clone' FAILED: : token 'EXIT-SUCCESS' not in record.out:
1350: --------------------------------------------------
1350: FAILED at src/test/vfork_done.c:93: !(ws >> 8 == (SIGTRAP | (PTRACE_EVENT_VFORK_DONE << 8))) errno:0 (Success)
1350: --------------------------------------------------
1350: Test vfork_done_clone failed, leaving behind /tmp/rr-test-vfork_done_clone-aqVg7RnRf and /home/opc/rr/build/rr-test-vfork_done_clone-HzH5KNNyZ
1350: To replay the failed test, run
1350:   _RR_TRACE_DIR=/tmp/rr-test-vfork_done_clone-aqVg7RnRf rr replay
1/2 Test #1350: vfork_done_clone .................***Failed  Error regular expression found in output. Regex=[FAILED]  1.30 sec
test 1351
    Start 1351: vfork_done_clone-no-syscallbuf

1351: Test command: /usr/bin/bash "source_dir/src/test/vfork_done_clone.run" "vfork_done_clone" "-n" "bin_dir" "120"
1351: Test timeout computed to be: 10000000
1351: source_dir/src/test/util.sh: line 279: 461073 Aborted                 (core dumped) _RR_TRACE_DIR="$workdir" test-monitor $TIMEOUT record.err $RR_EXE $GLOBAL_OPTIONS record $LIB_ARG $RECORD_ARGS "$exe" $exeargs > record.out 2> record.err
1351: Test 'vfork_done_clone' FAILED: : token 'EXIT-SUCCESS' not in record.out:
1351: --------------------------------------------------
1351: FAILED at src/test/vfork_done.c:93: !(ws >> 8 == (SIGTRAP | (PTRACE_EVENT_VFORK_DONE << 8))) errno:0 (Success)
1351: --------------------------------------------------
1351: Test vfork_done_clone failed, leaving behind /tmp/rr-test-vfork_done_clone-RwrDHUII4 and /home/opc/rr/build/rr-test-vfork_done_clone-QOIqXiqsJ
1351: To replay the failed test, run
1351:   _RR_TRACE_DIR=/tmp/rr-test-vfork_done_clone-RwrDHUII4 rr replay
2/2 Test #1351: vfork_done_clone-no-syscallbuf ...***Failed  Error regular expression found in output. Regex=[FAILED]  1.20 sec

0% tests passed, 2 tests failed out of 2

Total Test time (real) =   2.53 sec

The following tests FAILED:
        1350 - vfork_done_clone (Failed)
        1351 - vfork_done_clone-no-syscallbuf (Failed)
Errors while running CTest
Output from these tests are in: /home/opc/rr/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.

$ _RR_TRACE_DIR=/tmp/rr-test-vfork_done_clone-RwrDHUII4 bin/rr replay
GNU gdb (GDB) Red Hat Enterprise Linux 10.2-10.0.2.el9
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /tmp/rr-test-vfork_done_clone-RwrDHUII4/vfork_done-RwrDHUII4-0/mmap_clone_3_vfork_done-RwrDHUII4...Really redefine built-in command "restart"? (y or n) [answered Y; input not from terminal]
Really redefine built-in command "jump"? (y or n) [answered Y; input not from terminal]
Remote debugging using 127.0.0.1:2461
Reading symbols from /lib/ld-linux-aarch64.so.1...
BFD: warning: system-supplied DSO at 0x6ffd0000 has a section extending past end of file
0x0000ffffb700dcc0 in _start () from /lib/ld-linux-aarch64.so.1
(rr) c
Continuing.
FAILED at src/test/vfork_done.c:93: !(ws >> 8 == (SIGTRAP | (PTRACE_EVENT_VFORK_DONE << 8))) errno:0 (Success)

Program received signal SIGABRT, Aborted.
0x0000ffffb6e81690 in __pthread_kill_implementation () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.34-60.0.2.el9.aarch64
(rr) bt
#0  0x0000ffffb6e81690 in __pthread_kill_implementation () from /lib64/libc.so.6
#1  0x0000ffffb6e3c7fc in raise () from /lib64/libc.so.6
#2  0x0000000000400c70 in atomic_assert (cond=0, str=0x4018a8 "ws >> 8 == (SIGTRAP | (PTRACE_EVENT_VFORK_DONE << 8))", file=0x4015e0 "src/test/vfork_done.c",
    line=93) at /home/opc/rr/src/test/util.h:190
#3  0x0000000000401268 in main (argc=2, argv=0xffffc7c2ee78) at /home/opc/rr/src/test/vfork_done.c:93
(rr)

As this isn't mentioned in the linked issue I think there wasn't such a failure in February.

GitMensch commented 1 year ago

Interestingly the test passes as soon as I increase the amount of CPUs in that instance from 2 to 4 (February also had 4 cpus)...

rocallahan commented 1 year ago

Might possibly be fixed by 5817b78d83eb78b05f618813a95cdf54cff01b5b

GitMensch commented 1 year ago

Yes, this issue is fixed with current master, likely by that change.

Maybe you can have a look at #3569, too?