Closed vt-alt closed 1 year ago
As 5.7.0 is out (since 11 hours before your post) - can you please redo the build based on that, then after build do bin/rr record true
and bin/rr replay
(to check it works in general), then run the testsuite again?
Note: Works fine here with the slightly older i7-1165G7.
If there is any failure please output uname -a && cat /etc/*release*
, too.
(Looks like it's released 1 hour after my post), but of course I will retest .
If you run ./obj/bin/rr record ./obj/bin/detach_sigkill
manually (edit paths as necessary) does rr produce a trace or does it just hang?
Looks like it's released 1 hour after my post
I've though 11 AM to be earlier than 10 PM, but you never know if those timestamps are all correct in the first place. Thanks for re-running the test.
And your issue already led me to recheck on a newer Intel (with Debian) and finding that it works nicely (and I can't reproduce the bug that bugs me most there, even with nearly identical kernel :-/ , so thank you for the issue in any case.
I've though 11 AM to be earlier than 10 PM, but you never know if those timestamps are all correct in the first place.
Ah you are right! I compared dates and in my time zone they both PM but 12 PM is actually 11 hours before 11 PM not 1 hour later.
Oct 3, 2023, 12:14 PM GMT+3
Oct 3, 2023, 11:20 PM GMT+3
@khuey @GitMensch
Thank you all for suggestions I will try new tests later this week.
If you run
./obj/bin/rr record ./obj/bin/detach_sigkill
manually (edit paths as necessary) does rr produce a trace or does it just hang?
It just hangs.
# /usr/lib64/rr/testsuite/obj/bin/detach_sigkill
EXIT-SUCCESS
# rr /usr/lib64/rr/testsuite/obj/bin/detach_sigkill
rr: Saving execution to trace directory `/root/.local/share/rr/detach_sigkill-0'.
Processes:
root 1722954 0.0 0.0 850836 36108 pts/10 Sl+ 17:00 0:00 | \_ rr /usr/lib64/rr/testsuite/obj/bin/detach_sigkill
root 1722966 0.0 0.0 5884 3176 pts/10 S+ 17:00 0:00 | \_ /usr/lib64/rr/testsuite/obj/bin/detach_sigkill
root 1722967 0.0 0.0 8956 7760 pts/10 t+ 17:00 0:00 | | \_ /usr/lib64/rr/testsuite/obj/bin/detach_sigkill
root 1722968 0.0 0.0 8956 8808 pts/10 S+ 17:00 0:00 | \_ /usr/lib64/rr/testsuite/obj/bin/detach_sigkill
# cat /proc/1722954/stack
[<0>] do_wait+0x160/0x300
[<0>] kernel_wait4+0x9e/0x140
[<0>] do_syscall_64+0x59/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
# cat /proc/1722966/stack
[<0>] do_wait+0x160/0x300
[<0>] kernel_wait4+0x9e/0x140
[<0>] do_syscall_64+0x59/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
# cat /proc/1722967/stack
[<0>] ptrace_stop.part.0+0x165/0x2d0
[<0>] ptrace_do_notify+0xa6/0xe0
[<0>] ptrace_notify+0x59/0x90
[<0>] do_exit+0x8c0/0xa80
[<0>] do_group_exit+0x2d/0x80
[<0>] __x64_sys_exit_group+0x14/0x20
[<0>] do_syscall_64+0x59/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
# cat /proc/1722968/stack
[<0>] __do_sys_pause+0x30/0x60
[<0>] do_syscall_64+0x59/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
#
Now I'm going to try 5.7.0.
With 5.7.0 test detach_sigkill
succeeds but other tests still fail.
# rr --version
rr version 5.7.0
# rr /usr/lib64/rr/testsuite/obj/bin/detach_sigkill
rr: Saving execution to trace directory `/root/.local/share/rr/detach_sigkill-1'.
EXIT-SUCCESS
This stat is for 940/1473 tests: 925 Passed, 15 Failed.
(I stopped testing in the middle because I need to update the system before weekend and failing tests take so long to timeout. But I will rerun.)
Top failures:
894/1473 Test #894: alternate_thread_diversion .....................................***Failed Error regular expression found in output. Regex=[FAILED]129.11 sec
895/1473 Test #895: alternate_thread_diversion-no-syscallbuf .......................***Failed Error regular expression found in output. Regex=[FAILED]129.13 sec
920/1473 Test #920: block_clone_checkpoint .........................................***Failed Error regular expression found in output. Regex=[FAILED]129.11 sec
921/1473 Test #921: block_clone_checkpoint-no-syscallbuf ...........................***Failed Error regular expression found in output. Regex=[FAILED]129.14 sec
928/1473 Test #928: blocked_bad_ip .................................................***Failed Error regular expression found in output. Regex=[FAILED]129.24 sec
929/1473 Test #929: blocked_bad_ip-no-syscallbuf ...................................***Failed Error regular expression found in output. Regex=[FAILED]129.23 sec
934/1473 Test #934: breakpoint .....................................................***Failed Error regular expression found in output. Regex=[FAILED]129.11 sec
935/1473 Test #935: breakpoint-no-syscallbuf .......................................***Failed Error regular expression found in output. Regex=[FAILED]129.12 sec
936/1473 Test #936: breakpoint_conditions ..........................................***Failed Error regular expression found in output. Regex=[FAILED]129.12 sec
937/1473 Test #937: breakpoint_conditions-no-syscallbuf ............................***Failed Error regular expression found in output. Regex=[FAILED]129.12 sec
938/1473 Test #938: breakpoint_overlap .............................................***Failed Error regular expression found in output. Regex=[FAILED]129.14 sec
939/1473 Test #939: breakpoint_overlap-no-syscallbuf ...............................***Failed Error regular expression found in output. Regex=[FAILED]129.14 sec
940/1473 Test #940: call_function ..................................................***Failed Error regular expression found in output. Regex=[FAILED]129.12 sec
Strangely manual run of some tests succeeds:
# /usr/lib64/rr/testsuite/obj/bin/block_clone_checkpoint
EXIT-SUCCESS
# rr /usr/lib64/rr/testsuite/obj/bin/block_clone_checkpoint
rr: Saving execution to trace directory `/root/.local/share/rr/block_clone_checkpoint-0'.
EXIT-SUCCESS
# rr /usr/lib64/rr/testsuite/obj/bin/block_clone_checkpoint
rr: Saving execution to trace directory `/root/.local/share/rr/block_clone_checkpoint-1'.
EXIT-SUCCESS
# /usr/lib64/rr/testsuite/obj/bin/blocked_bad_ip
EXIT-SUCCESS
Segmentation fault (core dumped)
# rr /usr/lib64/rr/testsuite/obj/bin/blocked_bad_ip
rr: Saving execution to trace directory `/root/.local/share/rr/blocked_bad_ip-0'.
EXIT-SUCCESS
Segmentation fault
# /usr/lib64/rr/testsuite/obj/bin/breakpoint_conditions
EXIT-SUCCESS
# rr /usr/lib64/rr/testsuite/obj/bin/breakpoint_conditions
rr: Saving execution to trace directory `/root/.local/share/rr/breakpoint_conditions-0'.
EXIT-SUCCESS
I will rerun with --verbose
(also will upgrade kernel to 6.5.7).
Log from the last testing: LastTest-2023-10-14.log 1222 tests passed, 251 tests failed.
852/1473 Test #852: vfork_done .....................................................***Failed Error regular expression found in output. Regex=[FAILED]129.85 sec
894/1473 Test #894: alternate_thread_diversion .....................................***Failed Error regular expression found in output. Regex=[FAILED]128.09 sec
895/1473 Test #895: alternate_thread_diversion-no-syscallbuf .......................***Failed Error regular expression found in output. Regex=[FAILED]128.11 sec
920/1473 Test #920: block_clone_checkpoint .........................................***Failed Error regular expression found in output. Regex=[FAILED]128.10 sec
921/1473 Test #921: block_clone_checkpoint-no-syscallbuf ...........................***Failed Error regular expression found in output. Regex=[FAILED]128.11 sec
928/1473 Test #928: blocked_bad_ip .................................................***Failed Error regular expression found in output. Regex=[FAILED]128.16 sec
929/1473 Test #929: blocked_bad_ip-no-syscallbuf ...................................***Failed Error regular expression found in output. Regex=[FAILED]128.19 sec
...
Manual run of the first one (not sure if I am correctly running it):
/usr/lib64/rr/testsuite/obj# ./bin/vfork_done
EXIT-SUCCESS
/usr/lib64/rr/testsuite/obj# rr ./bin/vfork_done
rr: Saving execution to trace directory `/root/.local/share/rr/vfork_done-0'.
EXIT-SUCCESS
/usr/lib64/rr/testsuite/obj# /bin/bash "source_dir/src/test/basic_test.run" "vfork_done" "" "bin_dir" "120"
Test 'vfork_done' PASSED
/usr/lib64/rr/testsuite/obj# ctest -R vfork_done
Test project /usr/lib64/rr/testsuite/obj
Start 852: vfork_done
1/4 Test #852: vfork_done ....................... Passed 1.47 sec
Start 853: vfork_done-no-syscallbuf
2/4 Test #853: vfork_done-no-syscallbuf ......... Passed 1.47 sec
Start 1458: vfork_done_clone
3/4 Test #1458: vfork_done_clone ................. Passed 1.47 sec
Start 1459: vfork_done_clone-no-syscallbuf
4/4 Test #1459: vfork_done_clone-no-syscallbuf ... Passed 1.46 sec
100% tests passed, 0 tests failed out of 4
Total Test time (real) = 5.89 sec
So this fails only when run in full test suite?
So this fails only when run in full test suite?
Ok next test failed even on standalone run:
/usr/lib64/rr/testsuite/obj# ctest -R alternate_thread_diversion
Test project /usr/lib64/rr/testsuite/obj
Start 894: alternate_thread_diversion
1/2 Test #894: alternate_thread_diversion .................***Failed Error regular expression found in output. Regex=[FAILED]128.10 sec
Start 895: alternate_thread_diversion-no-syscallbuf
2/2 Test #895: alternate_thread_diversion-no-syscallbuf ...***Failed Error regular expression found in output. Regex=[FAILED]128.10 sec
0% tests passed, 2 tests failed out of 2
Total Test time (real) = 256.22 sec
The following tests FAILED:
894 - alternate_thread_diversion (Failed)
895 - alternate_thread_diversion-no-syscallbuf (Failed)
Errors while running CTest
Now is the time for... ctest -R alternate_thread_diversion -vv
I don't see the difference from ctest -R alternate_thread_diversion -VV
output, but here it is.
alternate_thread_diversion.log
The error show in the log is that it cannot find any debug symbols:
Reading symbols from /tmp/rr-test-alternate_thread_diversion-VoXtkWN6a/alternate_thread_diversion-VoXtkWN6a-0/mmap_hardlink_4_alternate_thread_diversion-VoXtkWN6a... Missing separate debuginfo for /tmp/rr-test-alternate_thread_diversion-VoXtkWN6a/alternate_thread_diversion-VoXtkWN6a-0/mmap_hardlink_4_alternate_thread_diversion-VoXtkWN6a. Try to install the hash file /usr/lib/debug/.build-id/21/2cb1258b216d5f5bc8b4bf6f64c2ffc6e8d371.debug (No debugging symbols found in /tmp/rr-test-alternate_thread_diversion-VoXtkWN6a/alternate_thread_diversion-VoXtkWN6a-0/mmap_hardlink_4_alternate_thread_diversion-VoXtkWN6a)
What does file say to this binary? Here's my output:
$ file bin/alternate_thread_diversion
bin/alternate_thread_diversion: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=4a08cbffd1e146bd76373cf27e869db3e615fd7e, for GNU/Linux 3.2.0, with debug_info, not stripped
$ gdb -q bin/alternate_thread_diversion
Reading symbols from bin/alternate_thread_diversion...
(gdb) quit
Thanks! After installing rr-testsuite-debuginfo
alternate_thread_diversion
tests passed:
/usr/lib64/rr/testsuite/obj# ctest -R alternate_thread_diversion -VV | tee log6
UpdateCTestConfiguration from :/usr/lib64/rr/testsuite/obj/DartConfiguration.tcl
UpdateCTestConfiguration from :/usr/lib64/rr/testsuite/obj/DartConfiguration.tcl
Test project /usr/lib64/rr/testsuite/obj
Constructing a list of tests
Done constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
test 894
Start 894: alternate_thread_diversion
894: Test command: /bin/bash "source_dir/src/test/alternate_thread_diversion.run" "alternate_thread_diversion" "" "bin_dir" "120"
894: Working Directory: /usr/lib64/rr/testsuite/obj
894: Test timeout computed to be: 10000000
894: Test 'alternate_thread_diversion' PASSED
1/2 Test #894: alternate_thread_diversion ................. Passed 0.52 sec
test 895
Start 895: alternate_thread_diversion-no-syscallbuf
895: Test command: /bin/bash "source_dir/src/test/alternate_thread_diversion.run" "alternate_thread_diversion" "-n" "bin_dir" "120"
895: Working Directory: /usr/lib64/rr/testsuite/obj
895: Test timeout computed to be: 10000000
895: Test 'alternate_thread_diversion' PASSED
2/2 Test #895: alternate_thread_diversion-no-syscallbuf ... Passed 0.50 sec
The following tests passed:
alternate_thread_diversion
alternate_thread_diversion-no-syscallbuf
100% tests passed, 0 tests failed out of 2
Total Test time (real) = 1.04 sec
After debuginfo install:
/usr/lib64/rr/testsuite/obj# file bin/alternate_thread_diversion
bin/alternate_thread_diversion: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=212cb1258b216d5f5bc8b4bf6f64c2ffc6e8d371, for GNU/Linux 3.2.0, stripped
/usr/lib64/rr/testsuite/obj# gdb -q bin/alternate_thread_diversion
Reading symbols from bin/alternate_thread_diversion...
Reading symbols from /usr/lib/debug/usr/lib64/rr/testsuite/obj/bin/alternate_thread_diversion.debug...
(gdb) quit
I will re-run all tests again today.
Does having debug symbols important for these tests?
I was running tests after build in virtual machine but because some tests fail I split testsuite into separate package (like in OpenSUSE) but I didn't realize installing debuginfo is also important to run them.
The debuginfo for the tests that are built in this package are needed to run these tests... You normally would not "install" the test programs so you won't have their debuginfo either.
Thanks for clarification! I will not separate debuginfo from testsuite then.
But (with debuginfo installed) there are still 16 test failures:
1242/1473 Test #1242: watchpoint_at_sched ............................................***Failed Error regular expression found in output. Regex=[FAILED]128.09 sec
1243/1473 Test #1243: watchpoint_at_sched-no-syscallbuf ..............................***Failed Error regular expression found in output. Regex=[FAILED]128.11 sec
1244/1473 Test #1244: watchpoint_before_signal .......................................***Failed Error regular expression found in output. Regex=[FAILED]128.11 sec
1245/1473 Test #1245: watchpoint_before_signal-no-syscallbuf .........................***Failed Error regular expression found in output. Regex=[FAILED]128.11 sec
1370/1473 Test #1370: replay_serve_files .............................................***Failed Error regular expression found in output. Regex=[FAILED]130.11 sec
1371/1473 Test #1371: replay_serve_files-no-syscallbuf ...............................***Failed Error regular expression found in output. Regex=[FAILED]130.13 sec
1392/1473 Test #1392: reverse_step_signal ............................................***Failed Error regular expression found in output. Regex=[FAILED]128.12 sec
1393/1473 Test #1393: reverse_step_signal-no-syscallbuf ..............................***Failed Error regular expression found in output. Regex=[FAILED]128.12 sec
1396/1473 Test #1396: reverse_watchpoint .............................................***Failed Error regular expression found in output. Regex=[FAILED]128.11 sec
1397/1473 Test #1397: reverse_watchpoint-no-syscallbuf ...............................***Failed Error regular expression found in output. Regex=[FAILED]128.10 sec
1442/1473 Test #1442: tick0 ..........................................................***Failed Error regular expression found in output. Regex=[FAILED] 0.26 sec
...
Full log LastTest.7.log
Manually poking the first one:
/usr/lib64/rr/testsuite/obj# ./bin/watchpoint_at_sched
EXIT-SUCCESS
/usr/lib64/rr/testsuite/obj# rr ./bin/watchpoint_at_sched
rr: Saving execution to trace directory `/root/.local/share/rr/watchpoint_at_sched-0'.
EXIT-SUCCESS
/usr/lib64/rr/testsuite/obj# ctest -R watchpoint_at_sched
Test project /usr/lib64/rr/testsuite/obj
Start 1242: watchpoint_at_sched
1/2 Test #1242: watchpoint_at_sched .................***Failed Error regular expression found in output. Regex=[FAILED]128.09 sec
Start 1243: watchpoint_at_sched-no-syscallbuf
2/2 Test #1243: watchpoint_at_sched-no-syscallbuf ...***Failed Error regular expression found in output. Regex=[FAILED]128.09 sec
0% tests passed, 2 tests failed out of 2
I also noticed that hardlink_mmapped_files
(which is Passed) and replay_serve_files
(which is Failed) have error cp: cannot stat '/usr/lib64/rr/testsuite/obj/lib/rr/libtest_lib.so': No such file or directory
while that library is installed into /usr/lib64/rr/libtest_lib.so
. Maybe this is not important since first test is Passed.
ctest --rerun-failed -vv
Will rerun only the failed tests and give all details for those.
/usr/lib64/rr/testsuite/obj# ctest --rerun-failed -vv |& tee log8
Test project /usr/lib64/rr/testsuite/obj
Start 1242: watchpoint_at_sched
1/2 Test #1242: watchpoint_at_sched .................***Failed Error regular expression found in output. Regex=[FAILED]128.10 sec
Start 1243: watchpoint_at_sched-no-syscallbuf
2/2 Test #1243: watchpoint_at_sched-no-syscallbuf ...***Failed Error regular expression found in output. Regex=[FAILED]128.10 sec
0% tests passed, 2 tests failed out of 2
Total Test time (real) = 256.24 sec
The following tests FAILED:
1242 - watchpoint_at_sched (Failed)
1243 - watchpoint_at_sched-no-syscallbuf (Failed)
Errors while running CTest
Output from these tests are in: /usr/lib64/rr/testsuite/obj/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
LastTest.log: LastTest.8.log
ps. If you need other tests I should rerun full tests again. But doesn't previous logs include that? Also, are you sure you need -vv
and not -VV
? ...Will rerun with both.
Oops - https://github.com/rr-debugger/rr/wiki/Building-And-Installing#tests so yes, I've meant -VV
.
Maybe it is time to close this issue with a note that this was mainly a problem of stripping debug info from test modules and create one issue for each "group" of failing tests that don't have an issue yet (using search function first), each with uname -a && cat/proc/cpuinfo
along with verbose logs for that test.
OK lets close this one. ~BTW, I don't see a difference between normal ctest
and ctest -VV
in LastTest.log
. They seems only differ on the stdout that I didn't attach because it's same or less than LastTest.log
.~
~ps. Oh yes formatting is different so I cannot really grep on -VV
log easily.~ Never mind this please.
JFYI In 5.8.0 in the same conditions these tests failed:
98% tests passed, 24 tests failed out of 1515
Total Test time (real) = 3939.12 sec
The following tests FAILED:
88 - clone_vfork_pidfd (Failed)
89 - clone_vfork_pidfd-no-syscallbuf (Failed)
768 - sioc (Failed)
769 - sioc-no-syscallbuf (Failed)
878 - vfork_done (Failed)
879 - vfork_done-no-syscallbuf (Failed)
1196 - step_into_lib (Failed)
1197 - step_into_lib-no-syscallbuf (Failed)
1274 - vsyscall_reverse_next (Failed)
1275 - vsyscall_reverse_next-no-syscallbuf (Failed)
1410 - replay_serve_files (Failed)
1411 - replay_serve_files-no-syscallbuf (Failed)
1432 - reverse_step_signal (Failed)
1433 - reverse_step_signal-no-syscallbuf (Failed)
1436 - reverse_watchpoint (Failed)
1437 - reverse_watchpoint-no-syscallbuf (Failed)
1470 - step_signal (Failed)
1471 - step_signal-no-syscallbuf (Failed)
1482 - tick0 (Failed)
1483 - tick0-no-syscallbuf (Failed)
1484 - tick0_less (Failed)
1485 - tick0_less-no-syscallbuf (Failed)
1512 - watchpoint_unaligned2 (Failed)
1513 - watchpoint_unaligned2-no-syscallbuf (Failed)
Errors while running CTest
On bare metal
Intel(R) Core(TM) i9-10900 CPU @ 2.80GHz
Linux 6.4.4 there are 249 test failures like this (first 10):And 1168 tests are passed. Tests are run from testsuite after install. Full log LastTest.log (15M).