xamarin / xamarin-macios

.NET for iOS, Mac Catalyst, macOS, and tvOS provide open-source bindings of the Apple SDKs for use with .NET managed languages such as C#
Other
2.43k stars 505 forks source link

Enable cooperative GC for non-WatchOS platforms. #3845

Open strangecargo opened 6 years ago

strangecargo commented 6 years ago

I sent an email to @rolfbjarne regarding the current state of cooperative GC in xamarin-macios and he suggested that I create an issue here to coordinate work.

AFAIK, cooperative is currently working in WatchOS, but not for any of the other platforms supported by xamarin-macios. I would like to be able to at least use MONO_ENABLE_BLOCKING_TRANSITION (if not full-blown MONO_ENABLE_COOP) to address the class of issue described in this mono issue.

COOP.md lists some completed/WIP statuses on certain files, but it hasn't been updated since the initial commit; it's unclear to me if it reflects the current state of xamarin-macios. What work would I have to do to get coop working on macOS (as well as the other platforms that xamarin-macios supports)?

luhenry commented 6 years ago

For any process using mono (as an executable or a dynamically loaded library) running on macOS, iOS or tvOS, you can simply set the MONO_ENABLE_BLOCKING_TRANSITION environment variable to 1 and that will enable them. Doing so might allow you to bypass certain classes of issues, but as it's not an officially supported configuration outside of watchOS - for now, - it will be very unstable, and it is more of an experimental and exploratory feature.

The Mono team is currently in the work of enabling these transitions by default, but that wouldn't be available in any stable XI release before a few months.

/cc @lambdageek

strangecargo commented 6 years ago

For any process using mono (as an executable or a dynamically loaded library) running on macOS, iOS or tvOS, you can simply set the MONO_ENABLE_BLOCKING_TRANSITION environment variable to 1 and that will enable them.

I should have mentioned in the initial post that I tried to do this, but turning on this environment variable causes even an empty Xamarin Mac app to fail an assertion:

2018-03-29 15:25:23.494 cooptest[12703:25982996] error: * Assertion at /Users/builder/data/lanes/5808/3979d081/source/xamarin-macios/external/mono/mono/mini/mini-trampolines.c:842, condition `mono_thread_is_gc_unsafe_mode ()' not met
Stacktrace:

  at <unknown> <0xffffffff>
  at ObjCRuntime.Runtime.Initialize (ObjCRuntime.Runtime/InitializationOptions*) [0x00011] in /Library/Frameworks/Xamarin.Mac.framework/Versions/4.2.1.28/src/Xamarin.Mac/ObjCRuntime/Runtime.cs:151
  at (wrapper runtime-invoke) <Module>.runtime_invoke_void_Runtime/InitializationOptions* (object,intptr,intptr,intptr) [0x00021] in <7d22d19c98ed41dd98ad4ca0e3900efe>:0
  at <unknown> <0xffffffff>
  at (wrapper managed-to-native) object.wrapper_native_0x1094b37d0 () [0x00012] in <89087a58e09e4b89a98e91d7aa6b103b>:0
  at ObjCRuntime.Runtime.EnsureInitialized () [0x0004a] in /Library/Frameworks/Xamarin.Mac.framework/Versions/4.2.1.28/src/Xamarin.Mac/ObjCRuntime/Runtime.mac.cs:118
  at AppKit.NSApplication.Init () [0x00017] in /Library/Frameworks/Xamarin.Mac.framework/Versions/4.2.1.28/src/Xamarin.Mac/AppKit/NSApplication.cs:56
  at cooptest.MainClass.Main (string[]) [0x00001] in /Users/allan/code/vs/cooptest/cooptest/Main.cs:9
  at (wrapper runtime-invoke) <Module>.runtime_invoke_void_object (object,intptr,intptr,intptr) [0x00051] in <084587d9d01a41edb4297fa40841fe11>:0

Native stacktrace:

    0   cooptest                            0x00000001095a8ab1 mono_handle_native_crash + 257
    1   libsystem_platform.dylib            0x00007fff728e3f5a _sigtramp + 26
    2   ???                                 0x00007ffee67762e8 0x0 + 140732765004520
    3   libsystem_c.dylib                   0x00007fff7270e312 abort + 127
    4   cooptest                            0x00000001094b3790 _ZL12log_callbackPKcS0_S0_iPv + 64
    5   cooptest                            0x00000001097691d3 monoeg_g_logv + 83
    6   cooptest                            0x00000001097693ef monoeg_assertion_message + 143
    7   cooptest                            0x00000001095bdc1a mono_magic_trampoline + 122
    8   ???                                 0x0000000109c1039e 0x0 + 4458611614
    9   ???                                 0x0000000109de739b 0x0 + 4460540827
    10  ???                                 0x0000000109de7b41 0x0 + 4460542785
    11  cooptest                            0x00000001095bb3ea mono_jit_runtime_invoke + 1338
    12  cooptest                            0x0000000109687fb4 do_runtime_invoke + 84
    13  cooptest                            0x0000000109687ec6 mono_runtime_invoke + 102
    14  cooptest                            0x00000001094b3abe xamarin_initialize + 750
    15  ???                                 0x0000000109de7259 0x0 + 4460540505
    16  cooptest                            0x00000001095bb3ea mono_jit_runtime_invoke + 1338
    17  cooptest                            0x0000000109687fb4 do_runtime_invoke + 84
    18  cooptest                            0x000000010968b5e9 do_exec_main_checked + 137
    19  cooptest                            0x000000010951a99f mono_jit_exec + 287
    20  cooptest                            0x000000010951d25d mono_main + 9325
    21  cooptest                            0x00000001094bebae xamarin_main + 1182
    22  cooptest                            0x00000001094bfb04 main + 36
    23  libdyld.dylib                       0x00007fff72662115 start + 1
    24  ???                                 0x0000000000000002 0x0 + 2

Debug info from gdb:

(lldb) command source -s 0 '/tmp/mono-gdb-commands.wmIas9'
Executing commands in '/tmp/mono-gdb-commands.wmIas9'.
(lldb) process attach --pid 12703
Process 12703 stopped
* thread #1, name = 'tid_307', queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
    frame #0: 0x00007fff727b2502 libsystem_kernel.dylib`__wait4 + 10
libsystem_kernel.dylib`__wait4:
->  0x7fff727b2502 <+10>: jae    0x7fff727b250c            ; <+20>
    0x7fff727b2504 <+12>: movq   %rax, %rdi
    0x7fff727b2507 <+15>: jmp    0x7fff727a90dd            ; cerror
    0x7fff727b250c <+20>: retq   
Target 0: (cooptest) stopped.

Executable module set to "/Users/allan/code/vs/cooptest/cooptest/bin/Debug/cooptest.app/Contents/MacOS/cooptest".
Architecture set to: x86_64h-apple-macosx.
(lldb) thread list
Process 12703 stopped
* thread #1: tid = 0x18c7814, 0x00007fff727b2502 libsystem_kernel.dylib`__wait4 + 10, name = 'tid_307', queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  thread #2: tid = 0x18c782c, 0x00007fff727b2562 libsystem_kernel.dylib`__workq_kernreturn + 10
  thread #3: tid = 0x18c782d, 0x00007fff727b2562 libsystem_kernel.dylib`__workq_kernreturn + 10
  thread #4: tid = 0x18c7831, 0x00007fff727b1cee libsystem_kernel.dylib`__psynch_cvwait + 10, name = 'SGen worker'
  thread #5: tid = 0x18c7832, 0x00007fff727a87fe libsystem_kernel.dylib`semaphore_wait_trap + 10, name = 'Finalizer'
(lldb) thread backtrace all
* thread #1, name = 'tid_307', queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff727b2502 libsystem_kernel.dylib`__wait4 + 10
    frame #1: 0x00000001095a8b3e cooptest`mono_handle_native_crash(signal="SIGABRT", ctx=<unavailable>, info=<unavailable>) at mini-exceptions.c:2726 [opt]
    frame #2: 0x00007fff728e3f5a libsystem_platform.dylib`_sigtramp + 26
    frame #3: 0x00007fff727b1e3f libsystem_kernel.dylib`__pthread_kill + 11
    frame #4: 0x00007fff728f0150 libsystem_pthread.dylib`pthread_kill + 333
    frame #5: 0x00007fff7270e312 libsystem_c.dylib`abort + 127
    frame #6: 0x00000001094b3790 cooptest`log_callback(log_domain=0x0000000000000000, log_level="error", message="* Assertion at /Users/builder/data/lanes/5808/3979d081/source/xamarin-macios/external/mono/mono/mini/mini-trampolines.c:842, condition `mono_thread_is_gc_unsafe_mode ()' not met\n", fatal=4, user_data=0x0000000000000000) at runtime.m:1206
    frame #7: 0x00000001097691d3 cooptest`monoeg_g_logv(log_domain=0x0000000000000000, log_level=G_LOG_LEVEL_ERROR, format=<unavailable>, args=<unavailable>) at goutput.c:115 [opt]
    frame #8: 0x00000001097693ef cooptest`monoeg_assertion_message(format=<unavailable>) at goutput.c:135 [opt]
    frame #9: 0x00000001095bdc1a cooptest`mono_magic_trampoline(regs=<unavailable>, code=<unavailable>, arg=<unavailable>, tramp=<unavailable>) at mini-trampolines.c:842 [opt]
    frame #10: 0x0000000109c1039e
    frame #11: 0x0000000109de739b
    frame #12: 0x0000000109de7b41
    frame #13: 0x00000001095bb3ea cooptest`mono_jit_runtime_invoke(method=<unavailable>, obj=<unavailable>, params=0x00007ffee6777010, exc=0x0000000109900720, error=<unavailable>) at mini-runtime.c:2800 [opt]
    frame #14: 0x0000000109687fb4 cooptest`do_runtime_invoke(method=0x00007f8bc2027fc8, obj=0x0000000000000000, params=0x00007ffee6777010, exc=0x00007ffee6776fe8, error=0x00007ffee6776f08) at object.c:2849 [opt]
    frame #15: 0x0000000109687ec6 cooptest`mono_runtime_invoke [inlined] mono_runtime_try_invoke(method=<unavailable>, obj=0x0000000000000000, params=0x00007ffee6777010, exc=0x00007ffee6776fe8, error=0x0000600000000000) at object.c:2956 [opt]
    frame #16: 0x0000000109687e87 cooptest`mono_runtime_invoke(method=0x00007f8bc2027fc8, obj=0x0000000000000000, params=0x00007ffee6777010, exc=0x00007ffee6776fe8) at object.c:2897 [opt]
    frame #17: 0x00000001094b3abe cooptest`::xamarin_initialize() at runtime.m:1340
    frame #18: 0x0000000109de7259
    frame #19: 0x00000001095bb3ea cooptest`mono_jit_runtime_invoke(method=<unavailable>, obj=<unavailable>, params=0x00007ffee6777328, exc=0x00007f8bc2816210, error=<unavailable>) at mini-runtime.c:2800 [opt]
    frame #20: 0x0000000109687fb4 cooptest`do_runtime_invoke(method=0x00007f8bc1f04a58, obj=0x0000000000000000, params=0x00007ffee6777328, exc=0x0000000000000000, error=0x00007ffee6777368) at object.c:2849 [opt]
    frame #21: 0x000000010968b5e9 cooptest`do_exec_main_checked [inlined] mono_runtime_invoke_checked(method=<unavailable>, obj=<unavailable>, error=<unavailable>) at object.c:3002 [opt]
    frame #22: 0x000000010968b5a8 cooptest`do_exec_main_checked(method=0x00007f8bc1f04a58, args=<unavailable>, error=0x00007ffee6777368) at object.c:4726 [opt]
    frame #23: 0x000000010951a99f cooptest`mono_jit_exec(domain=<unavailable>, assembly=<unavailable>, argc=2, argv=0x000060800005b438) at driver.g.c:1040 [opt]
    frame #24: 0x000000010951d25d cooptest`mono_main [inlined] main_thread_handler at driver.g.c:1109 [opt]
    frame #25: 0x000000010951d22a cooptest`mono_main(argc=<unavailable>, argv=0x000060800005b420) at driver.g.c:2222 [opt]
    frame #26: 0x00000001094bebae cooptest`::xamarin_main(argc=2, argv=0x00007ffee6777680, launch_mode=XamarinLaunchModeApp) at launcher.m:662
    frame #27: 0x00000001094bfb04 cooptest`main(argc=2, argv=0x00007ffee6777680) at launcher.m:680
    frame #28: 0x00007fff72662115 libdyld.dylib`start + 1
  thread #2
    frame #0: 0x00007fff727b2562 libsystem_kernel.dylib`__workq_kernreturn + 10
    frame #1: 0x00007fff728ed26f libsystem_pthread.dylib`_pthread_wqthread + 1552
    frame #2: 0x00007fff728ecc4d libsystem_pthread.dylib`start_wqthread + 13
  thread #3
    frame #0: 0x00007fff727b2562 libsystem_kernel.dylib`__workq_kernreturn + 10
    frame #1: 0x00007fff728ed06a libsystem_pthread.dylib`_pthread_wqthread + 1035
    frame #2: 0x00007fff728ecc4d libsystem_pthread.dylib`start_wqthread + 13
  thread #4, name = 'SGen worker'
    frame #0: 0x00007fff727b1cee libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fff728ee662 libsystem_pthread.dylib`_pthread_cond_wait + 732
    frame #2: 0x000000010974529e cooptest`thread_func [inlined] mono_os_cond_wait(mutex=<unavailable>) at mono-os-mutex.h:173 [opt]
    frame #3: 0x000000010974528b cooptest`thread_func at sgen-thread-pool.c:165 [opt]
    frame #4: 0x0000000109745135 cooptest`thread_func(data=0x0000000000000000) at sgen-thread-pool.c:196 [opt]
    frame #5: 0x00007fff728ed6c1 libsystem_pthread.dylib`_pthread_body + 340
    frame #6: 0x00007fff728ed56d libsystem_pthread.dylib`_pthread_start + 377
    frame #7: 0x00007fff728ecc5d libsystem_pthread.dylib`thread_start + 13
  thread #5, name = 'Finalizer'
    frame #0: 0x00007fff727a87fe libsystem_kernel.dylib`semaphore_wait_trap + 10
    frame #1: 0x00000001096220cc cooptest`finalizer_thread [inlined] mono_os_sem_wait(flags=MONO_SEM_FLAGS_ALERTABLE) at mono-os-semaphore.h:90 [opt]
    frame #2: 0x00000001096220c1 cooptest`finalizer_thread at mono-coop-semaphore.h:43 [opt]
    frame #3: 0x00000001096220b5 cooptest`finalizer_thread(unused=<unavailable>) at gc.c:866 [opt]
    frame #4: 0x00000001096d5c50 cooptest`start_wrapper [inlined] start_wrapper_internal at threads.c:1003 [opt]
    frame #5: 0x00000001096d5bb3 cooptest`start_wrapper(data=<unavailable>) at threads.c:1063 [opt]
    frame #6: 0x00007fff728ed6c1 libsystem_pthread.dylib`_pthread_body + 340
    frame #7: 0x00007fff728ed56d libsystem_pthread.dylib`_pthread_start + 377
    frame #8: 0x00007fff728ecc5d libsystem_pthread.dylib`thread_start + 13
(lldb) detach

=================================================================
Got a SIGABRT while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries 
used by your application.
=================================================================

(lldb) quit
Process 12703 detached

I was hoping that preventing the assertion was a matter of some simple setup to get the runtime in the right state at start time, but after investigating Rolf's previous commits, it was unclear to me where I should put this sort of initialization, and how much work beyond that was required to make it work legitimately.

lambdageek commented 6 years ago

TL;DR MONO_ENABLE_BLOCKING_TRANSITION won't work right now - you can make it work by doing some pervasive changes to the embedder, but for platforms where preemptive suspend works today we think we are working on another approach.

At a high level, there are two approaches to coop GC:

  1. Embedders/products such as XM/XI manage the coop state transitions themselves by wrapping sequences of code that use Mono APIs and pointers to managed objects in MONO_ENTER_GC_UNSAFE/MONO_EXIT_GC_UNSAFE macros.
    • This is the approach watchOS takes because the platform is quit restrictive
    • It requires discipline on behalf of the products and has proven quite fragile so far.
  2. The "Hybrid coop" approach: Mono APIs are enhanced to perform coop state transitions on entry/exit. Embedder/product code does not need to change. Furthermore, Mono will use distinct suspend mechanisms when a thread is running native embedder code vs when a thread is running Mono runtime and managed code. For native code we will use the preemptive (signal-based) suspend mechanism; for runtime&managed we will use cooperative (checkpoint-based) suspend.
    • No changes needed in embedder code.
    • We've done an experiment and the changes to Mono to perform the state transitions are pervasive but mechanical. (This is enough to get MONO_ENABLE_BLOCKING_TRANSITION not to assert, but Mono will still use preemptive suspend on all threads. This is enough to get some simple embedders working, but Visual Studio for Mac hangs in GC - so more work remains.)
    • I am currently working on adding hybrid suspend to Mono