ivmai / bdwgc

The Boehm-Demers-Weiser conservative C/C++ Garbage Collector (bdwgc, also known as bdw-gc, boehm-gc, libgc)
https://www.hboehm.info/gc/
Other
3k stars 406 forks source link

gctest fail on ppc64le with SOFT_VDB #376

Closed ivmai closed 8 months ago

ivmai commented 3 years ago

Build: https://app.travis-ci.com/github/ivmai/bdwgc/jobs/540924812 Source master (commit 2e7c81e) Host: Ubuntu/ppc64le Compiler: gcc Config: configure default Occurrence: < 1/60th

gctest output: Switched to incremental mode Reading dirty bits from /proc Lost a node at level 4 - collector is broken Test failed

ivmai commented 3 years ago

Probably caused by SOFT_VDB

ivmai commented 3 years ago

Hello @sharkcz If you meet this issue one day , please let me know, I will think how to localize it

ivmai commented 2 years ago

Latest build: https://app.travis-ci.com/github/ivmai/bdwgc/jobs/570835934 Source: master (107cfe0)

ivmai commented 2 years ago

Latest builds:

Source: master (faa3baf)

ivmai commented 2 years ago

Build: https://app.travis-ci.com/github/ivmai/bdwgc/jobs/576835649 Source: release-8_2 (4919305)

ivmai commented 2 years ago

Build: https://app.travis-ci.com/github/ivmai/bdwgc/jobs/586400606 Source: master (93381779c) Host: Linux/ppc64le Compiler: clang-12 Config: CFLAGS_EXTRA="-fsanitize=memory,undefined -fno-omit-frame-pointer" CONF_OPTIONS="--disable-shared"

Seems to be same issue. Output:

./gctest
Switched to incremental mode
Reading dirty bits from /proc
List reversal produced incorrect list - collector is broken
Test failed
jiegec commented 2 years ago

I got this error when building bowhm-gc in nix on ppc64le:

# TOTAL: 17
# PASS:  16
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0

# FAIL:  1
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: gctest
============

Switched to incremental mode
Reading dirty bits from /proc
FAIL gctest (exit status: 139)

============================================================================
Testsuite summary for gc 8.2.2
============================================================================
# TOTAL: 17
# PASS:  16
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0
============================================================================
See ./test-suite.log
Please report to https://github.com/ivmai/bdwgc/issues
============================================================================
make[3]: *** [Makefile:2048: test-suite.log] Error 1
make[3]: Leaving directory '/build/gc-8.2.2'
make[2]: *** [Makefile:2156: check-TESTS] Error 2
jiegec commented 2 years ago

Backtrace on v8.2.2:

Lost a node at level 1 - collector is broken
Test failed

Thread 19 "gctest" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffef2cf170 (LWP 2171270)]
0x00007ffff7c7d168 in __libc_signal_restore_set (set=0x7fffef2cda18) at ../sysdeps/unix/sysv/linux/internal-signals.h:86
86      ../sysdeps/unix/sysv/linux/internal-signals.h: No such file or directory.
(gdb) bt
#0  0x00007ffff7c7d168 in __libc_signal_restore_set (set=0x7fffef2cda18) at ../sysdeps/unix/sysv/linux/internal-signals.h:86
#1  __GI_raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:48
#2  0x00007ffff7c54850 in __GI_abort () at abort.c:79
#3  0x0000000100007b84 in chktree (t=<optimized out>, n=<optimized out>) at tests/test.c:1050
#4  0x0000000100007a04 in chktree (t=<optimized out>, n=<optimized out>) at tests/test.c:1056
#5  0x0000000100007a04 in chktree (t=<optimized out>, n=<optimized out>) at tests/test.c:1056
#6  0x0000000100007a04 in chktree (t=<optimized out>, n=<optimized out>) at tests/test.c:1056
#7  0x0000000100007a04 in chktree (t=<optimized out>, n=<optimized out>) at tests/test.c:1056
#8  0x0000000100007a04 in chktree (t=<optimized out>, n=<optimized out>) at tests/test.c:1056
#9  0x0000000100007a04 in chktree (t=<optimized out>, n=<optimized out>) at tests/test.c:1056
#10 0x0000000100007a04 in chktree (t=<optimized out>, n=<optimized out>) at tests/test.c:1056
#11 0x0000000100007f54 in tree_test () at tests/test.c:1171
#12 tree_test () at tests/test.c:1148
#13 0x0000000100008cdc in run_one_test () at tests/test.c:1626
#14 0x0000000100009148 in thr_run_one_test (arg=<optimized out>) at tests/test.c:2344
#15 0x00007ffff7f1dd6c in GC_inner_start_routine (sb=<optimized out>, arg=<optimized out>) at pthread_start.c:57
#16 0x00007ffff7f0bbf0 in GC_call_with_stack_base (fn=<optimized out>, arg=<optimized out>) at extra/../misc.c:2173
#17 0x00007ffff7f0bc74 in GC_start_routine (arg=<optimized out>) at extra/../pthread_support.c:2183
#18 0x00007ffff7e78838 in start_thread (arg=0x7fffef2cf170) at pthread_create.c:477
#19 0x00007ffff7d7b884 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:82
jiegec commented 2 years ago

It is weird that, if I enable ASan or valgrind, the error goes:

Completed 6 tests
Allocated 11602730 collectable objects
Allocated 1224 uncollectable objects
Allocated 8220420 atomic objects
Reallocated 36 objects
Garbage collection after fork is tested too
Finalized 13223/13223 objects - finalization is probably OK
Total number of bytes allocated is 660239708
Total memory use by allocated blocks is 3870720 bytes
Final heap size is 10747904 bytes
Obtained 45285376 bytes from OS (of which 23134208 bytes unmapped)
Final number of reachable objects is 3976
Completed 444 collections in 25451 ms (using 16 marker threads)
Collector appears to work
jiegec commented 2 years ago

I finally captured a asan error:

==2726945==Running thread 2726833 was not suspended. False leaks are possible.
==2726945==Running thread 2726834 was not suspended. False leaks are possible.
==2726945==Running thread 2726835 was not suspended. False leaks are possible.
==2726945==Running thread 2726836 was not suspended. False leaks are possible.
tests/test.c:518:9: runtime error: member access within misaligned address 0x000000000001 for type 'struct SEXPR', which requires 8 byte alignment
0x000000000001: note: pointer points here
<memory cannot be printed>
AddressSanitizer:DEADLYSIGNAL
=================================================================
==2726803==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000001 (pc 0x00010001339c bp 0x7dffe60b9d40 sp 0x7dffeb40cf80 T16)
==2726803==The signal is caused by a UNKNOWN memory access.
==2726803==Hint: address points to the zero page.
    #0 0x100013398 in check_ints tests/test.c:518
    #1 0x100015528 in reverse_test_inner tests/test.c:812
    #2 0x7ffff736c578 in GC_call_with_gc_active extra/../pthread_support.c:1788
    #3 0x1000159f0 in reverse_test_inner tests/test.c:727
    #4 0x7ffff73026e8 in GC_do_blocking_inner extra/../pthread_support.c:1597
    #5 0x7ffff7309af0 in GC_with_callee_saves_pushed extra/../mach_dep.c:421
    #6 0x7ffff733ede0 in GC_do_blocking extra/../misc.c:2305
    #7 0x10001aa70 in reverse_test tests/test.c:859
    #8 0x10001aa70 in run_one_test tests/test.c:1642
    #9 0x10001b124 in thr_run_one_test tests/test.c:2344
    #10 0x7ffff736f528 in GC_inner_start_routine /home/jiegec/bdwgc/pthread_start.c:57
    #11 0x7ffff733eb80 in GC_call_with_stack_base extra/../misc.c:2173
    #12 0x7ffff733eca0 in GC_start_routine extra/../pthread_support.c:2183
    #13 0x7ffff75ed2c4 in __asan::AsanThread::ThreadStart(unsigned long long, __sanitizer::atomic_uintptr_t*) ../../../../src/libsanitizer/asan/asan_thread.cc:260
    #14 0x7ffff74e7468 in asan_thread_start ../../../../src/libsanitizer/asan/asan_interceptors.cc:199
    #15 0x7ffff7218834 in start_thread /build/glibc-p3rpmK/glibc-2.31/nptl/pthread_create.c:477
    #16 0x7ffff674b880 in clone (/lib/powerpc64le-linux-gnu/libc.so.6+0x14b880)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV tests/test.c:518 in check_ints
Thread T16 created by T0 here:
    #0 0x7ffff74e752c in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cc:208
    #1 0x7ffff736f040 in GC_pthread_create extra/../pthread_support.c:2261
    #2 0x100010250 in main tests/test.c:2414
    #3 0x7ffff6624cc8 in generic_start_main ../csu/libc-start.c:308
    #4 0x7ffff6624ea0 in __libc_start_main ../sysdeps/unix/sysv/linux/powerpc/libc-start.c:98

==2726803==ABORTING
jiegec commented 2 years ago

After some repetitions, it only fails in the following places (line numbers are off-by one due to removing fork tests to make it easier to reproduce):

Lost a node at level 3 - collector is broken
Test failed
--Type <RET> for more, q to quit, c to continue without paging--

Thread 19 "gctest" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffef2cf170 (LWP 3618563)]
0x00007ffff7c7d168 in __libc_signal_restore_set (set=0x7fffef2cd7f8) at ../sysdeps/unix/sysv/linux/internal-signals.h:86
86      ../sysdeps/unix/sysv/linux/internal-signals.h: No such file or directory.
(gdb) bt
#0  0x00007ffff7c7d168 in __libc_signal_restore_set (set=0x7fffef2cd7f8) at ../sysdeps/unix/sysv/linux/internal-signals.h:86
#1  __GI_raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:48
#2  0x00007ffff7c54850 in __GI_abort () at abort.c:79
#3  0x0000000100007198 in chktree (t=0x7fffed8b6780, n=3) at tests/test.c:1051
#4  0x000000010000731c in chktree (t=0x7fffed8ab0e0, n=4) at tests/test.c:1062
#5  0x000000010000731c in chktree (t=0x7fffed8a6060, n=5) at tests/test.c:1062
#6  0x000000010000725c in chktree (t=0x7fffed8a6080, n=6) at tests/test.c:1057
#7  0x000000010000731c in chktree (t=0x7fffed8110e0, n=7) at tests/test.c:1062
#8  0x000000010000725c in chktree (t=0x7fffed811100, n=8) at tests/test.c:1057
#9  0x000000010000731c in chktree (t=0x7fffed20b120, n=9) at tests/test.c:1062
#10 0x000000010000731c in chktree (t=0x7fffeda172c0, n=10) at tests/test.c:1062
#11 0x000000010000731c in chktree (t=0x7fffe7c9f4e0, n=11) at tests/test.c:1062
#12 0x000000010000731c in chktree (t=0x7fffed4feb20, n=12) at tests/test.c:1062
#13 0x000000010000725c in chktree (t=0x7fffed4feb40, n=13) at tests/test.c:1057
#14 0x000000010000725c in chktree (t=0x7fffed4feb60, n=14) at tests/test.c:1057
#15 0x000000010000731c in chktree (t=0x7fffed662a80, n=15) at tests/test.c:1062
#16 0x000000010000731c in chktree (t=0x7fffecf43360, n=16) at tests/test.c:1062
#17 0x0000000100007c5c in tree_test () at tests/test.c:1169
#18 0x00000001000095f8 in run_one_test () at tests/test.c:1627
#19 0x000000010000a308 in thr_run_one_test (arg=0x0) at tests/test.c:2345
#20 0x00007ffff7f25384 in GC_inner_start_routine (sb=0x7fffef2ce728, arg=0x7fffffffed10) at pthread_start.c:57
#21 0x00007ffff7f182b8 in GC_call_with_stack_base (fn=0x7ffff7f2529c <GC_inner_start_routine>, arg=0x7fffffffed10)
    at extra/../misc.c:2173
#22 0x00007ffff7f249d8 in GC_start_routine (arg=0x7fffffffed10) at extra/../pthread_support.c:2183
#23 0x00007ffff7e78838 in start_thread (arg=0x7fffef2cf170) at pthread_create.c:477
#24 0x00007ffff7d7b884 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:82

2.

[Thread 0x7fffebf7f170 (LWP 3618757) exited]
List reversal produced incorrect list - collector is broken
Test failed
--Type <RET> for more, q to quit, c to continue without paging--

Thread 19 "gctest" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffef2cf170 (LWP 3618684)]
0x00007ffff7c7d168 in __libc_signal_restore_set (set=0x7fffef2cd428) at ../sysdeps/unix/sysv/linux/internal-signals.h:86
86      ../sysdeps/unix/sysv/linux/internal-signals.h: No such file or directory.
(gdb) bt
#0  0x00007ffff7c7d168 in __libc_signal_restore_set (set=0x7fffef2cd428) at ../sysdeps/unix/sysv/linux/internal-signals.h:86
#1  __GI_raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:48
#2  0x00007ffff7c54850 in __GI_abort () at abort.c:79
#3  0x000000010000576c in check_ints (list=0x7fffed8d0180, low=1, up=17) at tests/test.c:522
#4  0x00000001000065d0 in reverse_test_inner (data=0x1) at tests/test.c:843
#5  0x00007ffff7f23c14 in GC_call_with_gc_active (fn=0x10000604c <reverse_test_inner>, client_data=0x1)
    at extra/../pthread_support.c:1788
#6  0x0000000100006098 in reverse_test_inner (data=0x0) at tests/test.c:728
#7  0x00007ffff7f235d0 in GC_do_blocking_inner (data=0x7fffef2ce198 "L`", context=0x7fffef2cd9d8)
    at extra/../pthread_support.c:1597
#8  0x00007ffff7f1f0a0 in GC_with_callee_saves_pushed (fn=0x7ffff7f234fc <GC_do_blocking_inner>, arg=0x7fffef2ce198 "L`")
    at extra/../mach_dep.c:421
#9  0x00007ffff7f1836c in GC_do_blocking (fn=0x10000604c <reverse_test_inner>, client_data=0x0) at extra/../misc.c:2305
#10 0x00000001000066c4 in reverse_test () at tests/test.c:860
#11 0x00000001000096d0 in run_one_test () at tests/test.c:1643
#12 0x000000010000a308 in thr_run_one_test (arg=0x0) at tests/test.c:2345
#13 0x00007ffff7f25384 in GC_inner_start_routine (sb=0x7fffef2ce728, arg=0x7fffffffed10) at pthread_start.c:57
#14 0x00007ffff7f182b8 in GC_call_with_stack_base (fn=0x7ffff7f2529c <GC_inner_start_routine>, arg=0x7fffffffed10)
    at extra/../misc.c:2173
#15 0x00007ffff7f249d8 in GC_start_routine (arg=0x7fffffffed10) at extra/../pthread_support.c:2183
#16 0x00007ffff7e78838 in start_thread (arg=0x7fffef2cf170) at pthread_create.c:477
#17 0x00007ffff7d7b884 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:82

3.

[Thread 0x7fffecbcf170 (LWP 2452) exited]
[Thread 0x7fffe737f170 (LWP 2453) exited]
--Type <RET> for more, q to quit, c to continue without paging--

Thread 20 "gctest" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffeeabf170 (LWP 2411)]
0x00007ffff7f0c968 in GC_is_marked (p=0xffffffffffffffff) at extra/../mark.c:209
209         return (int)mark_bit_from_hdr(hhdr, bit_no); /* 0 or 1 */
(gdb) bt
#0  0x00007ffff7f0c968 in GC_is_marked (p=0xffffffffffffffff) at extra/../mark.c:209
#1  0x00007ffff7f07d8c in GC_make_disappearing_links_disappear (dl_hashtbl=0x7ffff7f524e8 <GC_arrays+280>, is_remove_dangling=0)
    at extra/../finalize.c:938
#2  0x00007ffff7f086c8 in GC_finalize () at extra/../finalize.c:1118
#3  0x00007ffff7f00b70 in GC_finish_collection () at extra/../alloc.c:1178
#4  0x00007ffff7eff2e0 in GC_maybe_gc () at extra/../alloc.c:534
#5  0x00007ffff7effd38 in GC_collect_a_little_inner (n=1) at extra/../alloc.c:769
#6  0x00007ffff7f0b7b4 in GC_generic_malloc_many (lb=16, k=1, result=0x7fffeeabe030) at extra/../mallocx.c:343
#7  0x00007ffff7f0bec8 in GC_malloc_many (lb=16) at extra/../mallocx.c:495
#8  0x0000000100007464 in alloc8bytes () at tests/test.c:1091
#9  0x0000000100007ac0 in alloc_small (n=5000000) at tests/test.c:1128
#10 0x0000000100007ba0 in tree_test () at tests/test.c:1156
#11 0x00000001000095f8 in run_one_test () at tests/test.c:1627
#12 0x000000010000a308 in thr_run_one_test (arg=0x0) at tests/test.c:2345
#13 0x00007ffff7f25384 in GC_inner_start_routine (sb=0x7fffeeabe728, arg=0x7fffffffed10) at pthread_start.c:57
#14 0x00007ffff7f182b8 in GC_call_with_stack_base (fn=0x7ffff7f2529c <GC_inner_start_routine>, arg=0x7fffffffed10)
    at extra/../misc.c:2173
#15 0x00007ffff7f249d8 in GC_start_routine (arg=0x7fffffffed10) at extra/../pthread_support.c:2183
#16 0x00007ffff7e78838 in start_thread (arg=0x7fffeeabf170) at pthread_create.c:477
#17 0x00007ffff7d7b884 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:82
ivmai commented 2 years ago

This means that the collector collected some live object.

ivmai commented 2 years ago

ASan error is just a consequence (of reusing live object). Sorry, I don't have time to investigate the issue now. But if you figure out the root cause I don't think it would be difficult to prepare the patch.

jiegec commented 2 years ago

Adding CFLAGS_EXTRA="-DNO_SOFT_VDB" does work for me on Power8, as mentioned in #479

ivmai commented 2 years ago

Tip for me: In https://app.travis-ci.com/github/ivmai/bdwgc/builds/257276745 (and later builds) all ppc64le builds failed (gctest fail).

ivmai commented 2 years ago

Tip for me: Recent failed build: https://app.travis-ci.com/github/ivmai/bdwgc/jobs/588637489 Source: release-8_2 (2b342c4)

ivmai commented 1 year ago

Build: https://app.travis-ci.com/github/ivmai/bdwgc/jobs/605030680 Source: release-8_2 (8f6d39d) Host: Ubuntu 16.04.7 LTS / ppc64le Compiler: gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609 How to build: ./configure && make -j check

ivmai commented 11 months ago

Probably related fail of gctest. Source: master (f369491a) Build: https://app.travis-ci.com/github/ivmai/bdwgc/jobs/614783228 Compiler: clang Cmake options: -DCMAKE_BUILD_TYPE=Release -Dbuild_tests=ON -Denable_cplusplus=ON -Denable_gc_assertions=ON Output: gctest ...........................Subprocess aborted***Exception: 7.84 sec

ivmai commented 9 months ago

Source: master (d934e7d5) Build: https://app.travis-ci.com/github/ivmai/bdwgc/jobs/617749260 Config: CFLAGS_EXTRA="-fsanitize=memory,undefined -fno-omit-frame-pointer" CONF_OPTIONS="--disable-shared" Output (gctest.log):

Supported VDBs: manual soft mprotect
Switched to incremental mode
Reading dirty bits from /proc
Lost a node at level 1 - collector is broken
ivmai commented 8 months ago

Should be fixed by https://github.com/ivmai/bdwgc/commit/6601eecd465c1d8a269737c63d7a0a69a0a48b16 I'll backport to release-8_2 branch later.