Open arshajii opened 8 months ago
Is this reproducible with the following? On the latest master.
cmake .. -DCMAKE_BUILD_TYPE=Debug -DBUILD_SHARED_LIBS=OFF -Denable_threads=ON -Denable_gc_assertions=ON -Denable_thread_local_alloc=OFF -Denable_handle_fork=OFF -DCFLAGS_EXTRA="-D MAX_MARKERS=4 -D GC_DISABLE_INCREMENTAL"
If it is, then please provide back trace for all threads.
Hi @ivmai -- with your cmake command it completes in about 80 seconds. Actually, I realized even my original build will also complete if given enough time, it just takes a very long time (~3min). This is latest master 7fc3e07f0981458ac4e65ff5efb582320ad4a8b5 -- the -DCFLAGS_EXTRA
arg was not working for me so I instead added target_compile_definitions(gc PUBLIC -DMAX_MARKERS=4 -DGC_DISABLE_INCREMENTAL)
to CMakeLists.txt. In case it is still helpful, here are a few backtraces using your cmake command:
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
* frame #0: 0x00000001a0bdb750 libsystem_kernel.dylib`__psynch_cvwait + 8
frame #1: 0x00000001a0c18574 libsystem_pthread.dylib`_pthread_cond_wait + 1232
frame #2: 0x000000010002febc a.out`GC_wait_marker at pthread_support.c:2945:9
frame #3: 0x000000010001d1a0 a.out`GC_mark_local(local_mark_stack=0x0000000100148000, id=0) at mark.c:1155:21
frame #4: 0x000000010001bc40 a.out`GC_do_parallel_mark at mark.c:1217:5
frame #5: 0x000000010001aaf8 a.out`GC_mark_some(cold_gc_frame="P\xed\xdfo\U00000001") at mark.c:436:19
frame #6: 0x000000010000580c a.out`GC_stopped_mark(stop_func=(a.out`GC_never_stop_func at alloc.c:170)) at alloc.c:887:15
frame #7: 0x0000000100004ff0 a.out`GC_try_to_collect_inner(stop_func=(a.out`GC_never_stop_func at alloc.c:170)) at alloc.c:602:10
frame #8: 0x0000000100008b30 a.out`GC_collect_or_expand(needed_blocks=1, flags=0, retry=0) at alloc.c:1686:24
frame #9: 0x0000000100009278 a.out`GC_allocobj(lg=3, k=1) at alloc.c:1818:18
frame #10: 0x000000010001676c a.out`GC_generic_malloc_inner_small(lb=32, k=1) at malloc.c:192:12
frame #11: 0x0000000100016c1c a.out`GC_generic_malloc_aligned(lb=32, k=1, flags=0, align_m1=0) at malloc.c:247:18
frame #12: 0x0000000100017778 a.out`GC_malloc_kind_aligned_global(lb=32, k=1, align_m1=0) at malloc.c:363:27
frame #13: 0x00000001000171e0 a.out`GC_malloc_kind_global(lb=32, k=1) at malloc.c:323:12
frame #14: 0x00000001000177b4 a.out`GC_malloc_kind(lb=32, k=1) at malloc.c:370:12
frame #15: 0x0000000100017804 a.out`GC_malloc(lb=32) at malloc.c:383:12
frame #16: 0x00000001000314f8 a.out`main + 84
frame #17: 0x00000001a08bff28 dyld`start + 2236
thread #2, name = 'GC-marker-0'
frame #0: 0x000000010001b668 a.out`GC_mark_from(mark_stack_top=0x000000016fe76f68, mark_stack=0x000000016fe76f78, mark_stack_limit=0x000000016fe86f78) at mark.c:862:39
frame #1: 0x000000010001cb7c a.out`GC_do_local_mark(local_mark_stack=0x000000016fe76f78, local_top=0x000000016fe76f78) at mark.c:1063:25
frame #2: 0x000000010001d3e4 a.out`GC_mark_local(local_mark_stack=0x000000016fe76f78, id=1) at mark.c:1194:9
frame #3: 0x000000010001d670 a.out`GC_help_marker(my_mark_no=120) at mark.c:1253:5
frame #4: 0x000000010002a960 a.out`GC_mark_thread(id=0x0000000000000000) at pthread_support.c:488:7
frame #5: 0x00000001a0c17fa8 libsystem_pthread.dylib`_pthread_start + 148
thread #3, name = 'GC-marker-1'
frame #0: 0x00000001a0bdb750 libsystem_kernel.dylib`__psynch_cvwait + 8
frame #1: 0x00000001a0c18574 libsystem_pthread.dylib`_pthread_cond_wait + 1232
frame #2: 0x000000010002febc a.out`GC_wait_marker at pthread_support.c:2945:9
frame #3: 0x000000010001d1a0 a.out`GC_mark_local(local_mark_stack=0x000000016ff02f78, id=2) at mark.c:1155:21
frame #4: 0x000000010001d670 a.out`GC_help_marker(my_mark_no=120) at mark.c:1253:5
frame #5: 0x000000010002a960 a.out`GC_mark_thread(id=0x0000000000000001) at pthread_support.c:488:7
frame #6: 0x00000001a0c17fa8 libsystem_pthread.dylib`_pthread_start + 148
thread #4, name = 'GC-marker-2'
frame #0: 0x00000001a0bdb750 libsystem_kernel.dylib`__psynch_cvwait + 8
frame #1: 0x00000001a0c18574 libsystem_pthread.dylib`_pthread_cond_wait + 1232
frame #2: 0x000000010002febc a.out`GC_wait_marker at pthread_support.c:2945:9
frame #3: 0x000000010001d1a0 a.out`GC_mark_local(local_mark_stack=0x000000016ff8ef78, id=3) at mark.c:1155:21
frame #4: 0x000000010001d670 a.out`GC_help_marker(my_mark_no=120) at mark.c:1253:5
frame #5: 0x000000010002a960 a.out`GC_mark_thread(id=0x0000000000000002) at pthread_support.c:488:7
frame #6: 0x00000001a0c17fa8 libsystem_pthread.dylib`_pthread_start + 148
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
* frame #0: 0x000000010001b668 a.out`GC_mark_from(mark_stack_top=0x0000000100148000, mark_stack=0x0000000100148000, mark_stack_limit=0x0000000100158000) at mark.c:862:39
frame #1: 0x000000010001cb7c a.out`GC_do_local_mark(local_mark_stack=0x0000000100148000, local_top=0x0000000100148010) at mark.c:1063:25
frame #2: 0x000000010001d3e4 a.out`GC_mark_local(local_mark_stack=0x0000000100148000, id=0) at mark.c:1194:9
frame #3: 0x000000010001bc40 a.out`GC_do_parallel_mark at mark.c:1217:5
frame #4: 0x000000010001aaf8 a.out`GC_mark_some(cold_gc_frame="P\xed\xdfo\U00000001") at mark.c:436:19
frame #5: 0x000000010000580c a.out`GC_stopped_mark(stop_func=(a.out`GC_never_stop_func at alloc.c:170)) at alloc.c:887:15
frame #6: 0x0000000100004ff0 a.out`GC_try_to_collect_inner(stop_func=(a.out`GC_never_stop_func at alloc.c:170)) at alloc.c:602:10
frame #7: 0x0000000100008b30 a.out`GC_collect_or_expand(needed_blocks=1, flags=0, retry=0) at alloc.c:1686:24
frame #8: 0x0000000100009278 a.out`GC_allocobj(lg=3, k=1) at alloc.c:1818:18
frame #9: 0x000000010001676c a.out`GC_generic_malloc_inner_small(lb=32, k=1) at malloc.c:192:12
frame #10: 0x0000000100016c1c a.out`GC_generic_malloc_aligned(lb=32, k=1, flags=0, align_m1=0) at malloc.c:247:18
frame #11: 0x0000000100017778 a.out`GC_malloc_kind_aligned_global(lb=32, k=1, align_m1=0) at malloc.c:363:27
frame #12: 0x00000001000171e0 a.out`GC_malloc_kind_global(lb=32, k=1) at malloc.c:323:12
frame #13: 0x00000001000177b4 a.out`GC_malloc_kind(lb=32, k=1) at malloc.c:370:12
frame #14: 0x0000000100017804 a.out`GC_malloc(lb=32) at malloc.c:383:12
frame #15: 0x00000001000314f8 a.out`main + 84
frame #16: 0x00000001a08bff28 dyld`start + 2236
thread #2, name = 'GC-marker-0'
frame #0: 0x00000001a0bdb750 libsystem_kernel.dylib`__psynch_cvwait + 8
frame #1: 0x00000001a0c18574 libsystem_pthread.dylib`_pthread_cond_wait + 1232
frame #2: 0x000000010002febc a.out`GC_wait_marker at pthread_support.c:2945:9
frame #3: 0x000000010001d1a0 a.out`GC_mark_local(local_mark_stack=0x000000016fe76f78, id=2) at mark.c:1155:21
frame #4: 0x000000010001d670 a.out`GC_help_marker(my_mark_no=74) at mark.c:1253:5
frame #5: 0x000000010002a960 a.out`GC_mark_thread(id=0x0000000000000000) at pthread_support.c:488:7
frame #6: 0x00000001a0c17fa8 libsystem_pthread.dylib`_pthread_start + 148
thread #3, name = 'GC-marker-1'
frame #0: 0x00000001a0bdb750 libsystem_kernel.dylib`__psynch_cvwait + 8
frame #1: 0x00000001a0c18574 libsystem_pthread.dylib`_pthread_cond_wait + 1232
frame #2: 0x000000010002febc a.out`GC_wait_marker at pthread_support.c:2945:9
frame #3: 0x000000010001d1a0 a.out`GC_mark_local(local_mark_stack=0x000000016ff02f78, id=1) at mark.c:1155:21
frame #4: 0x000000010001d670 a.out`GC_help_marker(my_mark_no=74) at mark.c:1253:5
frame #5: 0x000000010002a960 a.out`GC_mark_thread(id=0x0000000000000001) at pthread_support.c:488:7
frame #6: 0x00000001a0c17fa8 libsystem_pthread.dylib`_pthread_start + 148
thread #4, name = 'GC-marker-2'
frame #0: 0x00000001a0bdb750 libsystem_kernel.dylib`__psynch_cvwait + 8
frame #1: 0x00000001a0c18574 libsystem_pthread.dylib`_pthread_cond_wait + 1232
frame #2: 0x000000010002febc a.out`GC_wait_marker at pthread_support.c:2945:9
frame #3: 0x000000010001d1a0 a.out`GC_mark_local(local_mark_stack=0x000000016ff8ef78, id=3) at mark.c:1155:21
frame #4: 0x000000010001d670 a.out`GC_help_marker(my_mark_no=74) at mark.c:1253:5
frame #5: 0x000000010002a960 a.out`GC_mark_thread(id=0x0000000000000002) at pthread_support.c:488:7
frame #6: 0x00000001a0c17fa8 libsystem_pthread.dylib`_pthread_start + 148
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
* frame #0: 0x00000001a0bdb750 libsystem_kernel.dylib`__psynch_cvwait + 8
frame #1: 0x00000001a0c18574 libsystem_pthread.dylib`_pthread_cond_wait + 1232
frame #2: 0x000000010002febc a.out`GC_wait_marker at pthread_support.c:2945:9
frame #3: 0x000000010001d1a0 a.out`GC_mark_local(local_mark_stack=0x0000000100148000, id=0) at mark.c:1155:21
frame #4: 0x000000010001bc40 a.out`GC_do_parallel_mark at mark.c:1217:5
frame #5: 0x000000010001aaf8 a.out`GC_mark_some(cold_gc_frame="P\xed\xdfo\U00000001") at mark.c:436:19
frame #6: 0x000000010000580c a.out`GC_stopped_mark(stop_func=(a.out`GC_never_stop_func at alloc.c:170)) at alloc.c:887:15
frame #7: 0x0000000100004ff0 a.out`GC_try_to_collect_inner(stop_func=(a.out`GC_never_stop_func at alloc.c:170)) at alloc.c:602:10
frame #8: 0x0000000100008b30 a.out`GC_collect_or_expand(needed_blocks=1, flags=0, retry=0) at alloc.c:1686:24
frame #9: 0x0000000100009278 a.out`GC_allocobj(lg=3, k=1) at alloc.c:1818:18
frame #10: 0x000000010001676c a.out`GC_generic_malloc_inner_small(lb=32, k=1) at malloc.c:192:12
frame #11: 0x0000000100016c1c a.out`GC_generic_malloc_aligned(lb=32, k=1, flags=0, align_m1=0) at malloc.c:247:18
frame #12: 0x0000000100017778 a.out`GC_malloc_kind_aligned_global(lb=32, k=1, align_m1=0) at malloc.c:363:27
frame #13: 0x00000001000171e0 a.out`GC_malloc_kind_global(lb=32, k=1) at malloc.c:323:12
frame #14: 0x00000001000177b4 a.out`GC_malloc_kind(lb=32, k=1) at malloc.c:370:12
frame #15: 0x0000000100017804 a.out`GC_malloc(lb=32) at malloc.c:383:12
frame #16: 0x00000001000314f8 a.out`main + 84
frame #17: 0x00000001a08bff28 dyld`start + 2236
thread #2, name = 'GC-marker-0'
frame #0: 0x00000001a0bdb750 libsystem_kernel.dylib`__psynch_cvwait + 8
frame #1: 0x00000001a0c18574 libsystem_pthread.dylib`_pthread_cond_wait + 1232
frame #2: 0x000000010002febc a.out`GC_wait_marker at pthread_support.c:2945:9
frame #3: 0x000000010001d1a0 a.out`GC_mark_local(local_mark_stack=0x000000016fe76f78, id=2) at mark.c:1155:21
frame #4: 0x000000010001d670 a.out`GC_help_marker(my_mark_no=87) at mark.c:1253:5
frame #5: 0x000000010002a960 a.out`GC_mark_thread(id=0x0000000000000000) at pthread_support.c:488:7
frame #6: 0x00000001a0c17fa8 libsystem_pthread.dylib`_pthread_start + 148
thread #3, name = 'GC-marker-1'
frame #0: 0x0000000100013988 a.out`GC_find_header(h="\x96\r\v") at headers.c:37:16
frame #1: 0x000000010001c2bc a.out`GC_push_contents_hdr(current="\x96\r\v", mark_stack_top=0x000000016ff02f78, mark_stack_limit=0x000000016ff12f78, source="\xe0=\xb5\U00000002\U00000001", hhdr=0x000000010321abf0, do_offset_check=1) at gc_pmark.h:302:7
frame #2: 0x000000010001b7dc a.out`GC_mark_from(mark_stack_top=0x000000016ff02f78, mark_stack=0x000000016ff02f78, mark_stack_limit=0x000000016ff12f78) at mark.c:896:9
frame #3: 0x000000010001cb7c a.out`GC_do_local_mark(local_mark_stack=0x000000016ff02f78, local_top=0x000000016ff02f88) at mark.c:1063:25
frame #4: 0x000000010001d3e4 a.out`GC_mark_local(local_mark_stack=0x000000016ff02f78, id=1) at mark.c:1194:9
frame #5: 0x000000010001d670 a.out`GC_help_marker(my_mark_no=87) at mark.c:1253:5
frame #6: 0x000000010002a960 a.out`GC_mark_thread(id=0x0000000000000001) at pthread_support.c:488:7
frame #7: 0x00000001a0c17fa8 libsystem_pthread.dylib`_pthread_start + 148
thread #4, name = 'GC-marker-2'
frame #0: 0x00000001a0bdb750 libsystem_kernel.dylib`__psynch_cvwait + 8
frame #1: 0x00000001a0c18574 libsystem_pthread.dylib`_pthread_cond_wait + 1232
frame #2: 0x000000010002febc a.out`GC_wait_marker at pthread_support.c:2945:9
frame #3: 0x000000010001d1a0 a.out`GC_mark_local(local_mark_stack=0x000000016ff8ef78, id=3) at mark.c:1155:21
frame #4: 0x000000010001d670 a.out`GC_help_marker(my_mark_no=87) at mark.c:1253:5
frame #5: 0x000000010002a960 a.out`GC_mark_thread(id=0x0000000000000002) at pthread_support.c:488:7
frame #6: 0x00000001a0c17fa8 libsystem_pthread.dylib`_pthread_start + 148
Hmm, the stack traces above do not show any issue - e.g. in case 1: thread 2 is working, other threads just wait thread 2 to complete its part of work. Is there a hang inside GC_mark_from (of thread 2)? If yes, where?
the -DCFLAGS_EXTRA arg was not working for me so I instead added target_compile_definitions
Yes, my mistake. Passing quotes is not straightforward sometimes. Should be:
cmake .. -DCMAKE_BUILD_TYPE=Debug -DBUILD_SHARED_LIBS=OFF -Denable_threads=ON -Denable_gc_assertions=ON -Denable_thread_local_alloc=OFF -Denable_handle_fork=OFF "-DCFLAGS_EXTRA=-DMAX_MARKERS=4 -DGC_DISABLE_INCREMENTAL"
I can't reproduce it on Mac-mini (x86_64).
I changed 100000000
to 10000000
(10 times smaller) in the above test code (t.c).
gcc -O3 -I ../include t.c libgc.a
With GC_allow_register_threads():
time ./a.out
./a.out 3.53s user 0.05s system 97% cpu 3.677 total
Without GC_allow_register_threads:
time ./a.out
./a.out 11.78s user 0.10s system 99% cpu 11.983 total
So, I don't see any hang. On the opposite I see that parallel marker brings a speedup.
I observe the same issue as described in @arshajii post. macOS Sonoma 14.2.1, Apple clang version 15.0.0 (clang-1500.1.0.2.5), M1 Pro.
I see. Could you please investigate what is going on in GC_mark_from
thread #2, name = 'GC-marker-0'
frame #0: 0x000000010001b668 a.out`GC_mark_from(mark_stack_top=0x000000016fe76f68, mark_stack=0x000000016fe76f78, mark_stack_limit=0x000000016fe86f78) at mark.c:862:39
frame #1: 0x000000010001cb7c a.out`GC_do_local_mark(local_mark_stack=0x000000016fe76f78, local_top=0x000000016fe76f78) at mark.c:1063:25
frame #2: 0x000000010001d3e4 a.out`GC_mark_local(local_mark_stack=0x000000016fe76f78, id=1) at mark.c:1194:9
Is some loop in GC_mark_from running forever, or is GC_mark_from called with same arguments unlimited amount of time, or something else?
I took a look at this -- GC_mark_from
is taking the bulk of the time after running the program through a profiler, both in the main thread and in all the marker threads.
If I add some simple print statements in GC_mark_from
at entry and exit, I see it's always being called with the same arguments:
GC_mark_from 0x100898010 0x100898000 0x1008a8000 (three arguments of GC_mark_from)
GC_mark_from done
GC_mark_from 0x100898010 0x100898000 0x1008a8000
GC_mark_from done
GC_mark_from 0x100898010 0x100898000 0x1008a8000
GC_mark_from done
...
Not sure if this is expected. Let me know if there's anything else that would be helpful.
Just following up on this. Once this is sorted we'll be able to upgrade the GC in Codon. Let me know if there's anything else we can do to help track down the issue.
Please fetch the latest master (I've added some assertion to detect an infinite loop in alloc.c, not sure if it relates to this issue). Also please check if this is reproducible with -DMAX_MARKERS=2
(instead of 4).
If I add some simple print statements in GC_mark_from at entry and exit, I see it's always being called with the same arguments
Also print GC_gc_no along with the GC_mark_from arguments.
And I don't understand whether you observe a hang or a too long execution of the given test code?
It would be, of course, much better if you dig into the logic and figure out the root cause of this issue. (I don't think I will have time to do it myself within April or May.)
Hi @ivmai -- I finally had time to revisit this. I'm pretty sure I've narrowed the problem down to this commit: https://github.com/ivmai/bdwgc/commit/60e4b3f8ffeada7635e42788ec6b4e38136fa320
Prior to this commit, the test program in the my original post finishes in 1-2 seconds consistently. With this commit, it occasionally hangs for >2min (and other times it also completes in 1-2 seconds).
Hopefully this is enough to track down the issue -- let me know if I can be of additional help.
My build steps:
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DBUILD_SHARED_LIBS=OFF -Denable_threads=ON -Denable_large_config=ON -Denable_thread_local_alloc=ON -Denable_handle_fork=ON
make
clang -O3 -Iinclude test.c build/libgc.a
Interesting.
I have modified the test to output some stat at the end.
On a cygwin64 host I see significant degradation for the execution time, but in some other commit (not bisected yet):
gcc -I include -I libatomic_ops/src -D GC_THREADS -D THREAD_LOCAL_ALLOC -D PARALLEL_MARK -D LARGE_CONFIG -DALL_INTERIOR_POINTERS -DENABLE_DISCLAIM -DGC_ATOMIC_UNCOLLECTABLE -DGC_GCJ_SUPPORT -DJAVA_FINALIZATION -DNO_EXECUTE_PERMISSION -DUSE_MMAP -DUSE_MUNMAP -Wall -Wextra -Wpedantic test619.c extra/gc.c
Just wanted to follow up on this and see if there's any way I can help. Do we have any idea what the issue might be?
Current master f7e513ac06055310656e5c3c3bc9e1d84571e1b4. Here is the test program:
Built with
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DBUILD_SHARED_LIBS=OFF -Denable_threads=ON -Denable_large_config=ON -Denable_thread_local_alloc=ON -Denable_handle_fork=ON
on a M1 Max MacBook Pro.The program hangs here:
If you comment
GC_allow_register_threads()
or build with-Denable_parallel_mark=OFF
, it completes normally in about a second.