ivmai / bdwgc

The Boehm-Demers-Weiser conservative C/C++ Garbage Collector (bdwgc, also known as bdw-gc, boehm-gc, libgc)
https://www.hboehm.info/gc/
Other
2.95k stars 406 forks source link

build broken on Power9 system since enabling SOFT_VDB #479

Closed sharkcz closed 1 year ago

sharkcz commented 1 year ago

After moving our multiarch CI from a Power8 system to Power9 we found that the test-suite is failing on ppc64le architecture. Bisecting showed that it started with commit d76376060e029b80e1acb5e40c00283da6f8b788

...
make  check-TESTS
make[2]: Entering directory '/home/jenkins/workspace/gc/label/ppc64le'
make[3]: Entering directory '/home/jenkins/workspace/gc/label/ppc64le'
PASS: cordtest
./test-driver: line 112: 1052731 Aborted                 (core dumped) "$@" >> "$log_file" 2>&1
FAIL: gctest
PASS: hugetest
PASS: leaktest
PASS: middletest
PASS: realloctest
PASS: smashtest
PASS: staticrootstest
PASS: atomicopstest
PASS: initfromthreadtest
PASS: subthreadcreatetest
PASS: threadkeytest
PASS: threadleaktest
./test-driver: line 112: 1054374 Segmentation fault      (core dumped) "$@" >> "$log_file" 2>&1
FAIL: cpptest
FAIL: disclaimtest
PASS: disclaim_bench
./test-driver: line 112: 1054459 Segmentation fault      (core dumped) "$@" >> "$log_file" 2>&1
FAIL: weakmaptest
============================================================================
Testsuite summary for gc 8.3.0
============================================================================
# TOTAL: 17
# PASS:  13
# SKIP:  0
# XFAIL: 0
# FAIL:  4
# XPASS: 0
# ERROR: 0
...

Operating system is Fedora 36 and Fedora Rawhide. Builds on ppc64 using (outdated) Fedora 28 are still OK. I can provide more details if needed, or arrange access to a ppc64le system.

ivmai commented 1 year ago

Seems to duplicate #376 . Please provide details about gctest crash

ivmai commented 1 year ago

This is not reproducible if passed -DNO_SOFT_VDB to CFLAGS, right?

sharkcz commented 1 year ago

This is not reproducible if passed -DNO_SOFT_VDB to CFLAGS, right?

unfortunately it is still reproducible, the command line I am using is CPPFLAGS=-DNO_SOFT_VDB ./configure --disable-static --enable-cplusplus --enable-large-config --enable-threads=posix --with-libatomic-ops=none && make && make check and I have confirmed the -DNO_SOFT_VDB parameter is passed to gcc

sharkcz commented 1 year ago

Seems to duplicate #376 . Please provide details about gctest crash

output of thread apply all bt on the gctest coredump is (this is without the -DNO_SOFT_VDB and built from commit d76376060e029b80e1acb5e40c00283da6f8b788

Thread 7 (Thread 0x7fff9edaf120 (LWP 12956)):
#0  futex_wait (private=0, expected=2, futex_word=0x7fffa1a60f80 <mark_mutex>) at ../sysdeps/nptl/futex-internal.h:146
#1  __GI___lll_lock_wait (futex=0x7fffa1a60f80 <mark_mutex>, private=<optimized out>) at lowlevellock.c:49
#2  0x00007fffa16be734 in lll_mutex_lock_optimized (mutex=0x7fffa1a60f80 <mark_mutex>) at pthread_mutex_lock.c:48
#3  ___pthread_mutex_lock (mutex=0x7fffa1a60f80 <mark_mutex>) at pthread_mutex_lock.c:93
#4  0x00007fffa1a13c38 in GC_generic_lock (lock=lock@entry=0x7fffa1a60f80 <mark_mutex>) at extra/../pthread_support.c:2138
#5  0x00007fffa1a31a60 in GC_generic_lock (lock=0x7fffa1a60f80 <mark_mutex>) at extra/../pthread_support.c:2112
#6  GC_acquire_mark_lock () at extra/../pthread_support.c:2337
#7  GC_generic_malloc_many (lb=16, k=<optimized out>, result=0x7fffa19e3c70) at extra/../mallocx.c:365
#8  0x00007fffa1a32278 in GC_malloc_kind (bytes=0, kind=<optimized out>) at extra/../thread_local_alloc.c:184
#9  0x00007fffa1a323ec in GC_malloc_atomic (lb=<optimized out>) at extra/../malloc.c:343
#10 0x00000000100153f4 in run_one_test () at tests/test.c:1498
#11 0x0000000010015cc8 in thr_run_one_test (arg=<optimized out>) at tests/test.c:2300
#12 0x00007fffa1a37060 in GC_inner_start_routine (sb=<optimized out>, arg=<optimized out>) at pthread_start.c:57
#13 0x00007fffa1a261f4 in GC_call_with_stack_base (fn=<optimized out>, arg=<optimized out>) at extra/../misc.c:2160
#14 0x00007fffa1a26254 in GC_start_routine (arg=<optimized out>) at extra/../pthread_support.c:1947
#15 0x00007fffa16b9b38 in start_thread (arg=0x7fff9edaf120) at pthread_create.c:442
#16 0x00007fffa176a4f0 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:107

Thread 6 (Thread 0x7fffa15ff120 (LWP 12951)):
#0  __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=<optimized out>, expected=<optimized out>, futex_word=0x7fffa1aa1c50 <mark_cv+40>) at futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=<optimized out>, abstime=0x0, clockid=<optimized out>, expected=<optimized out>, futex_word=0x7fffa1aa1c50 <mark_cv+40>) at futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=0x7fffa1aa1c50 <mark_cv+40>, expected=<optimized out>, clockid=<optimized out>, abstime=0x0, private=<optimized out>) at futex-internal.c:139
#3  0x00007fffa16b8ac4 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7fffa1a60f80 <mark_mutex>, cond=0x7fffa1aa1c28 <mark_cv>) at pthread_cond_wait.c:503
#4  ___pthread_cond_wait (cond=0x7fffa1aa1c28 <mark_cv>, mutex=0x7fffa1a60f80 <mark_mutex>) at pthread_cond_wait.c:618
#5  0x00007fffa1a1d680 in GC_wait_marker () at extra/../pthread_support.c:2387
#6  GC_help_marker (my_mark_no=my_mark_no@entry=34) at extra/../mark.c:1221
#7  0x00007fffa1a1d894 in GC_mark_thread (id=<optimized out>) at extra/../pthread_support.c:414
#8  GC_mark_thread (id=<optimized out>) at extra/../pthread_support.c:375
#9  0x00007fffa16b9b38 in start_thread (arg=0x7fffa15ff120) at pthread_create.c:442
#10 0x00007fffa176a4f0 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:107

Thread 5 (Thread 0x7fffa1e33660 (LWP 12937)):
#0  futex_wait (private=0, expected=2, futex_word=0x7fffa1a60fe8 <GC_allocate_ml>) at ../sysdeps/nptl/futex-internal.h:146
#1  __GI___lll_lock_wait (futex=0x7fffa1a60fe8 <GC_allocate_ml>, private=<optimized out>) at lowlevellock.c:49
#2  0x00007fffa16be734 in lll_mutex_lock_optimized (mutex=0x7fffa1a60fe8 <GC_allocate_ml>) at pthread_mutex_lock.c:48
#3  ___pthread_mutex_lock (mutex=0x7fffa1a60fe8 <GC_allocate_ml>) at pthread_mutex_lock.c:93
#4  0x00007fffa1a147b4 in GC_lock () at extra/../pthread_support.c:2236
#5  0x00007fffa1a319f0 in GC_generic_malloc_many (lb=32, k=<optimized out>, result=0x7fffa1aa1ce8 <first_thread+120>) at extra/../mallocx.c:332
#6  0x00007fffa1a32278 in GC_malloc_kind (bytes=16, kind=<optimized out>) at extra/../thread_local_alloc.c:184
#7  0x00007fffa1a323ec in GC_malloc_atomic (lb=<optimized out>) at extra/../malloc.c:343
#8  0x00000000100125a0 in small_cons_leaf (x=4346) at tests/test.c:404
#9  ints (low=low@entry=4346, up=<optimized out>, up@entry=4500) at tests/test.c:467
#10 0x00000000100125ec in ints (up=4500, low=4346) at tests/test.c:464
#11 ints (low=low@entry=4345, up=up@entry=4500) at tests/test.c:467
#12 0x00000000100125ec in ints (up=4500, low=4345) at tests/test.c:464
#13 ints (low=low@entry=4344, up=up@entry=4500) at tests/test.c:467
#14 0x00000000100125ec in ints (up=4500, low=4344) at tests/test.c:464
#15 ints (low=low@entry=4343, up=up@entry=4500) at tests/test.c:467
#16 0x00000000100125ec in ints (up=4500, low=4343) at tests/test.c:464
#17 ints (low=low@entry=4342, up=up@entry=4500) at tests/test.c:467
#18 0x00000000100125ec in ints (up=4500, low=4342) at tests/test.c:464
#19 ints (low=low@entry=4341, up=up@entry=4500) at tests/test.c:467
#20 0x00000000100125ec in ints (up=4500, low=4341) at tests/test.c:464
...
#8692 0x00000000100125ec in ints (up=4500, low=5) at tests/test.c:464
#8693 ints (low=low@entry=4, up=up@entry=4500) at tests/test.c:467
#8694 0x00000000100125ec in ints (up=4500, low=4) at tests/test.c:464
#8695 ints (low=low@entry=3, up=up@entry=4500) at tests/test.c:467
#8696 0x00000000100125ec in ints (up=4500, low=3) at tests/test.c:464
#8697 ints (low=low@entry=2, up=up@entry=4500) at tests/test.c:467
#8698 0x00000000100125ec in ints (up=4500, low=2) at tests/test.c:464
#8699 ints (low=low@entry=1, up=up@entry=4500) at tests/test.c:467
#8700 0x0000000010013a9c in ints (up=4500, low=1) at tests/test.c:464
#8701 reverse_test_inner (data=<optimized out>) at tests/test.c:784
#8702 0x00007fffa1a35d40 in GC_call_with_gc_active (fn=0x10013810 <reverse_test_inner>, client_data=0x1) at extra/../pthread_support.c:1574
#8703 0x0000000010013e6c in reverse_test_inner (data=<optimized out>) at tests/test.c:724
#8704 0x00007fffa1a1491c in GC_do_blocking_inner (data=0x7fffff891930 "\020\070\001\020", context=<optimized out>) at extra/../pthread_support.c:1455
#8705 0x00007fffa1a17a48 in GC_with_callee_saves_pushed (fn=fn@entry=0x7fffa1a14840 <GC_do_blocking_inner>, arg=arg@entry=0x7fffff891930 "\020\070\001\020") at extra/../mach_dep.c:335
#8706 0x00007fffa1a262b4 in GC_do_blocking (fn=<optimized out>, client_data=<optimized out>) at extra/../misc.c:2288
#8707 0x000000001001575c in reverse_test () at tests/test.c:851
#8708 run_one_test () at tests/test.c:1581
#8709 0x00000000100117e8 in main () at tests/test.c:2397

Thread 4 (Thread 0x7fff9f5bf120 (LWP 12955)):
#0  futex_wait (private=0, expected=2, futex_word=0x7fffa1a60fe8 <GC_allocate_ml>) at ../sysdeps/nptl/futex-internal.h:146
#1  __GI___lll_lock_wait (futex=0x7fffa1a60fe8 <GC_allocate_ml>, private=<optimized out>) at lowlevellock.c:49
#2  0x00007fffa16be734 in lll_mutex_lock_optimized (mutex=0x7fffa1a60fe8 <GC_allocate_ml>) at pthread_mutex_lock.c:48
#3  ___pthread_mutex_lock (mutex=0x7fffa1a60fe8 <GC_allocate_ml>) at pthread_mutex_lock.c:93
#4  0x00007fffa1a13c38 in GC_generic_lock (lock=0x7fffa1a60fe8 <GC_allocate_ml>) at extra/../pthread_support.c:2138
#5  0x00007fffa1a1ed94 in GC_free (p=0x7fffa1950cf0) at extra/../malloc.c:601
#6  GC_free (p=0x7fffa1950cf0) at extra/../malloc.c:558
#7  0x0000000010015664 in run_one_test () at tests/test.c:1524
#8  0x0000000010015cc8 in thr_run_one_test (arg=<optimized out>) at tests/test.c:2300
#9  0x00007fffa1a37060 in GC_inner_start_routine (sb=<optimized out>, arg=<optimized out>) at pthread_start.c:57
#10 0x00007fffa1a261f4 in GC_call_with_stack_base (fn=<optimized out>, arg=<optimized out>) at extra/../misc.c:2160
#11 0x00007fffa1a26254 in GC_start_routine (arg=<optimized out>) at extra/../pthread_support.c:1947
#12 0x00007fffa16b9b38 in start_thread (arg=0x7fff9f5bf120) at pthread_create.c:442
#13 0x00007fffa176a4f0 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:107

Thread 3 (Thread 0x7fff9fdcf120 (LWP 12954)):
#0  0x00007fffa16b4abc in __GI___lll_lock_wake (futex=<optimized out>, private=<optimized out>) at lowlevellock.c:65
#1  0x00007fffa16c0de4 in lll_mutex_unlock_optimized (mutex=0x7fffa1a60fe8 <GC_allocate_ml>) at pthread_mutex_unlock.c:43
#2  __GI___pthread_mutex_unlock_usercnt (mutex=0x7fffa1a60fe8 <GC_allocate_ml>, decr=<optimized out>) at pthread_mutex_unlock.c:68
#3  0x00007fffa1a14748 in fork_parent_proc () at extra/../pthread_support.c:1153
#4  fork_parent_proc () at extra/../pthread_support.c:1145
#5  0x00007fffa1720af4 in __run_postfork_handlers (who=<optimized out>, do_locking=<optimized out>, lastrun=1) at register-atfork.c:187
#6  0x00007fffa171fbc0 in __libc_fork () at fork.c:127
#7  0x00000000100156b0 in run_one_test () at tests/test.c:1530
#8  0x0000000010015cc8 in thr_run_one_test (arg=<optimized out>) at tests/test.c:2300
#9  0x00007fffa1a37060 in GC_inner_start_routine (sb=<optimized out>, arg=<optimized out>) at pthread_start.c:57
#10 0x00007fffa1a261f4 in GC_call_with_stack_base (fn=<optimized out>, arg=<optimized out>) at extra/../misc.c:2160
#11 0x00007fffa1a26254 in GC_start_routine (arg=<optimized out>) at extra/../pthread_support.c:1947
#12 0x00007fffa16b9b38 in start_thread (arg=0x7fff9fdcf120) at pthread_create.c:442
#13 0x00007fffa176a4f0 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:107

Thread 2 (Thread 0x7fffa0def120 (LWP 12952)):
#0  futex_wait (private=0, expected=2, futex_word=0x7fffa1a60fe8 <GC_allocate_ml>) at ../sysdeps/nptl/futex-internal.h:146
#1  __GI___lll_lock_wait (futex=0x7fffa1a60fe8 <GC_allocate_ml>, private=<optimized out>) at lowlevellock.c:49
#2  0x00007fffa16be734 in lll_mutex_lock_optimized (mutex=0x7fffa1a60fe8 <GC_allocate_ml>) at pthread_mutex_lock.c:48
#3  ___pthread_mutex_lock (mutex=0x7fffa1a60fe8 <GC_allocate_ml>) at pthread_mutex_lock.c:93
#4  0x00007fffa1a13c38 in GC_generic_lock (lock=0x7fffa1a60fe8 <GC_allocate_ml>) at extra/../pthread_support.c:2138
#5  0x00007fffa1a319f0 in GC_generic_malloc_many (lb=32, k=<optimized out>, result=0x7fffa19c0c78) at extra/../mallocx.c:332
#6  0x00007fffa1a32278 in GC_malloc_kind (bytes=16, kind=<optimized out>) at extra/../thread_local_alloc.c:184
#7  0x00007fffa1a323ec in GC_malloc_atomic (lb=<optimized out>) at extra/../malloc.c:343
#8  0x00000000100125a0 in small_cons_leaf (x=3111) at tests/test.c:404
#9  ints (low=low@entry=3111, up=<optimized out>, up@entry=4500) at tests/test.c:467
#10 0x00000000100125ec in ints (up=4500, low=3111) at tests/test.c:464
#11 ints (low=low@entry=3110, up=up@entry=4500) at tests/test.c:467
#12 0x00000000100125ec in ints (up=4500, low=3110) at tests/test.c:464
#13 ints (low=low@entry=3109, up=up@entry=4500) at tests/test.c:467
#14 0x00000000100125ec in ints (up=4500, low=3109) at tests/test.c:464
#15 ints (low=low@entry=3108, up=up@entry=4500) at tests/test.c:467
#16 0x00000000100125ec in ints (up=4500, low=3108) at tests/test.c:464
#17 ints (low=low@entry=3107, up=up@entry=4500) at tests/test.c:467
#18 0x00000000100125ec in ints (up=4500, low=3107) at tests/test.c:464
#19 ints (low=low@entry=3106, up=up@entry=4500) at tests/test.c:467
#20 0x00000000100125ec in ints (up=4500, low=3106) at tests/test.c:464
...
#6226 0x00000000100125ec in ints (up=4500, low=3) at tests/test.c:464
#6227 ints (low=low@entry=2, up=up@entry=4500) at tests/test.c:467
#6228 0x00000000100125ec in ints (up=4500, low=2) at tests/test.c:464
#6229 ints (low=low@entry=1, up=up@entry=4500) at tests/test.c:467
#6230 0x0000000010013aa8 in ints (up=4500, low=1) at tests/test.c:464
#6231 reverse_test_inner (data=<optimized out>) at tests/test.c:784
#6232 0x00007fffa1a35d40 in GC_call_with_gc_active (fn=0x10013810 <reverse_test_inner>, client_data=0x1) at extra/../pthread_support.c:1574
#6233 0x0000000010013e6c in reverse_test_inner (data=<optimized out>) at tests/test.c:724
#6234 0x00007fffa1a1491c in GC_do_blocking_inner (data=0x7fffa0dee210 "\020\070\001\020", context=<optimized out>) at extra/../pthread_support.c:1455
#6235 0x00007fffa1a17a48 in GC_with_callee_saves_pushed (fn=fn@entry=0x7fffa1a14840 <GC_do_blocking_inner>, arg=arg@entry=0x7fffa0dee210 "\020\070\001\020") at extra/../mach_dep.c:335
#6236 0x00007fffa1a262b4 in GC_do_blocking (fn=<optimized out>, client_data=<optimized out>) at extra/../misc.c:2288
#6237 0x000000001001575c in reverse_test () at tests/test.c:851
#6238 run_one_test () at tests/test.c:1581
#6239 0x0000000010015cc8 in thr_run_one_test (arg=<optimized out>) at tests/test.c:2300
#6240 0x00007fffa1a37060 in GC_inner_start_routine (sb=<optimized out>, arg=<optimized out>) at pthread_start.c:57
#6241 0x00007fffa1a261f4 in GC_call_with_stack_base (fn=<optimized out>, arg=<optimized out>) at extra/../misc.c:2160
#6242 0x00007fffa1a26254 in GC_start_routine (arg=<optimized out>) at extra/../pthread_support.c:1947
#6243 0x00007fffa16b9b38 in start_thread (arg=0x7fffa0def120) at pthread_create.c:442
#6244 0x00007fffa176a4f0 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:107

Thread 1 (Thread 0x7fffa05df120 (LWP 12953)):
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=<optimized out>, no_tid=<optimized out>) at pthread_kill.c:44
#1  0x00007fffa1657cec in __GI_raise (sig=<optimized out>) at ../sysdeps/posix/raise.c:26
#2  0x00007fffa1630960 in __GI_abort () at abort.c:79
#3  0x0000000010013084 in check_ints (list=<optimized out>, low=low@entry=1, up=up@entry=49) at tests/test.c:518
#4  0x0000000010013af8 in reverse_test_inner (data=<optimized out>) at tests/test.c:794
#5  0x00007fffa1a35d40 in GC_call_with_gc_active (fn=0x10013810 <reverse_test_inner>, client_data=0x1) at extra/../pthread_support.c:1574
#6  0x0000000010013e6c in reverse_test_inner (data=<optimized out>) at tests/test.c:724
#7  0x00007fffa1a1491c in GC_do_blocking_inner (data=0x7fffa05de210 "\020\070\001\020", context=<optimized out>) at extra/../pthread_support.c:1455
#8  0x00007fffa1a17a48 in GC_with_callee_saves_pushed (fn=fn@entry=0x7fffa1a14840 <GC_do_blocking_inner>, arg=arg@entry=0x7fffa05de210 "\020\070\001\020") at extra/../mach_dep.c:335
#9  0x00007fffa1a262b4 in GC_do_blocking (fn=<optimized out>, client_data=<optimized out>) at extra/../misc.c:2288
#10 0x000000001001575c in reverse_test () at tests/test.c:851
#11 run_one_test () at tests/test.c:1581
#12 0x0000000010015cc8 in thr_run_one_test (arg=<optimized out>) at tests/test.c:2300
#13 0x00007fffa1a37060 in GC_inner_start_routine (sb=<optimized out>, arg=<optimized out>) at pthread_start.c:57
#14 0x00007fffa1a261f4 in GC_call_with_stack_base (fn=<optimized out>, arg=<optimized out>) at extra/../misc.c:2160
#15 0x00007fffa1a26254 in GC_start_routine (arg=<optimized out>) at extra/../pthread_support.c:1947
#16 0x00007fffa16b9b38 in start_thread (arg=0x7fffa05df120) at pthread_create.c:442
#17 0x00007fffa176a4f0 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:107
ivmai commented 1 year ago

the command line I am using is CPPFLAGS=-DNO_SOFT_VDB ./configure ... and I have confirmed the -DNO_SOFT_VDB parameter is passed to gcc

Hmm. Please try: ./configure ... && make check CFLAGS_EXTRA="-DNO_SOFT_VDB"

ivmai commented 1 year ago

I can provide more details if needed, or arrange access to a ppc64le system.

The details are not useful, it just signal that incremental collection is broken. Access to the system would be helpful, I will be able to look at a week later or so.

ivmai commented 1 year ago

@jiegec wrote: Adding CFLAGS_EXTRA="-DNO_SOFT_VDB" does work for me on Power8

@jiegec, @sharkcz, @peterhoeg, It looks like I found the root cause, I am going to prepare the right fix for it in a week. But, could you please check if CFLAGS_EXTRA="-D NO_VDB_FOR_STATIC_ROOTS" works (instead of defining NO_SFT_VDB)? On v8.2.2 or master. Thank you

jiegec commented 1 year ago

It still fails on POWER8NVL (8.2.2):

$ ./autogen.sh && ./configure && make check CFLAGS_EXTRA="-D NO_VDB_FOR_STATIC_ROOTS"
libtool: link: gcc -fexceptions -DGC_VISIBILITY_HIDDEN_SET -fvisibility=hidden -Wall -Wextra -Wpedantic -Wno-long-long -g -O2 -fno-strict-aliasing -Wno-frame-address -D NO_VDB_FOR_STATIC_ROOTS -o .libs/disclaim_bench tests/disclaim_bench.o  ./.libs/libgc.so
depbase=`echo tests/disclaim_weakmap_test.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'`;\
gcc -DHAVE_CONFIG_H   -I./include -I./include  -DGC_PTHREAD_START_STANDALONE    -fexceptions -DGC_VISIBILITY_HIDDEN_SET -fvisibility=hidden -Wall -Wextra -Wpedantic -Wno-long-long -g -O2 -fno-strict-aliasing -Wno-frame-address -D NO_VDB_FOR_STATIC_ROOTS -MT tests/disclaim_weakmap_test.o -MD -MP -MF $depbase.Tpo -c -o tests/disclaim_weakmap_test.o tests/disclaim_weakmap_test.c &&\
mv -f $depbase.Tpo $depbase.Po
/bin/bash ./libtool  --tag=CC   --mode=link gcc   -fexceptions -DGC_VISIBILITY_HIDDEN_SET -fvisibility=hidden -Wall -Wextra -Wpedantic -Wno-long-long -g -O2 -fno-strict-aliasing -Wno-frame-address -D NO_VDB_FOR_STATIC_ROOTS   -o disclaim_weakmap_test tests/disclaim_weakmap_test.o  ./libgc.la  -lpthread -ldl
libtool: link: gcc -fexceptions -DGC_VISIBILITY_HIDDEN_SET -fvisibility=hidden -Wall -Wextra -Wpedantic -Wno-long-long -g -O2 -fno-strict-aliasing -Wno-frame-address -D NO_VDB_FOR_STATIC_ROOTS -o .libs/disclaim_weakmap_test tests/disclaim_weakmap_test.o  ./.libs/libgc.so -lpthread -ldl
make[2]: 'libstaticrootslib_test.la' is up to date.
make[2]: 'libstaticrootslib2_test.la' is up to date.
make[2]: Leaving directory '/home/jiegec/bdwgc'
make  check-TESTS
make[2]: Entering directory '/home/jiegec/bdwgc'
make[3]: Entering directory '/home/jiegec/bdwgc'
PASS: cordtest
./test-driver: line 107: 69553 Segmentation fault      (core dumped) "$@" > $log_file 2>&1
FAIL: gctest
ivmai commented 1 year ago

Okay, seems to be different root cause

sharkcz commented 1 year ago

and same here, with master branch and F-36 on P9 ppc64le

ivmai commented 1 year ago

This is a bug in kernel, the scenario is: write to a page, then read pagemap (the corresponding bit is set), then write '4' to /proc/self/clear_refs, then write to the page again, then read pagemap but the corresponding bit is not set (as if the page is not dirty). I have created a workaround to detect this malfunction at runtime (the code is in detect_soft_dirty_supported) - please test commit d654f40. Later, I'm going to apply this W/A to release-8_2 branch.

I appreciate if someone reports the bug to Linux kernel forks.

/cc @hboehm

sharkcz commented 1 year ago

And I confirm the problem went away, both my local and CI builds have succeeded with the last sources.

sharkcz commented 1 year ago

Being it kernel memory mgmt issue would explain why it affects P9 only, P8 are using a different MMU (hash) than P9 (uses radix by default). I will check P9 VM with hash MMU, it shouldn't be affected by the original issue ...

sharkcz commented 1 year ago

And yes, the issue haven't existed on P9 with hash MMU, I will bring it to the powerpc kernel maintainers. Great job, @ivmai

ivmai commented 1 year ago

Good! Thank you for reporting and testing.

sharkcz commented 1 year ago

proposed kernel fix https://lists.ozlabs.org/pipermail/linuxppc-dev/2023-May/257763.html