rochus-keller / Oberon

Oberon parser, code model & browser, compiler and IDE with debugger, and an implementation of the Oberon+ programming language
GNU General Public License v2.0
464 stars 30 forks source link

Mono hangs when creating a new thread in an external library module #17

Closed rochus-keller closed 2 years ago

rochus-keller commented 2 years ago

Even the most trivial test case hangs:

      module Test4
          import C := NAppCore
          proc worker(data: *void): integer begin println("hello from worker") return 0 end worker
      var t: *C.Thread
      begin
          println("start Test4")
          C.core_start()
          t := C.bthread_create_imp(worker, nil)
          println("thread created")
          C.bthread_wait(t)
          C.core_finish()
          println("end Test4")
      end Test4

This just prints

      start Test4
      thread created

to the console and then waits forever; I found an old Mono bug report which seems to describe this behaviour: https://bugzilla.xamarin.com/15/15695/bug.html; same behaviour in both the Boehm and Sgen version of Mono; still researching. Apparently the worker function is not run (we don't get output from it); no wonder we wait on bthread_wait.

The generated IL code doesn't look suspicious from my point of view:

    call void [NAppCore]NAppCore::core_start()
    ldsflda valuetype [NAppCore]NAppCore/Thread* Test4::t
    ldnull
    ldftn int32 Test4::worker(native int data)
    newobj instance void class [NAppCore]NAppCore/@597c1d38e252b47b26d76277cefb2b8b::.ctor(object, native int)
    dup
    call void [OBX.Runtime]OBX.Runtime::addRef(object)
    call native int [mscorlib]System.Runtime.InteropServices.Marshal::GetFunctionPointerForDelegate(class [mscorlib]System.Delegate)
    ldnull
    call valuetype [NAppCore]NAppCore/Thread* [NAppCore]NAppCore::bthread_create_imp(native int thmain, native int data)
    stind.i
    ldstr "thread started\0"
    callvirt instance char[] [mscorlib]System.String::ToCharArray()
    call string [OBX.Runtime]OBX.Runtime::toString(char[])
    call void [mscorlib]System.Console::WriteLine(string)
    ldsfld valuetype [NAppCore]NAppCore/Thread* Test4::t
    call int32 [NAppCore]NAppCore::bthread_wait(valuetype [NAppCore]NAppCore/Thread* thread)
    pop
    call void [NAppCore]NAppCore::core_finish()

When the same code is exported to C and run everything works as expected.

rochus-keller commented 2 years ago

I found the following:

  public class Test5 {
      private static void worker(object o) {
              System.Console.WriteLine("hello from worker");
      }

      static void begin() {
              System.Console.WriteLine("start test");
              System.Threading.Thread t = new System.Threading.Thread(worker);
              t.Start();
              t.Join(); // hangs in join
              System.Console.WriteLine("end test");
      }

      static Test5()  {
      // no effect: System.Runtime.CompilerServices.RuntimeHelpers.RunClassConstructor(typeof (object).TypeHandle);
      // no effect: begin();
             System.Console.WriteLine("start test");
             System.Threading.Thread t = new System.Threading.Thread(worker);
             t.Start();
             t.Join(); // hangs in join
             System.Console.WriteLine("end test");
      }

  static void Main(string[] args)  {
          /*
          System.Console.WriteLine("hello from Main");
          System.Console.WriteLine("start test");
          System.Threading.Thread t = new System.Threading.Thread(worker);
          t.Start();
          t.Join(); // works
          System.Console.WriteLine("end test");
          */
      }
  }

This is a minimal C# application which is able to demonstrate the effect. Thread.Join() uses assumingly the same low-level function as bthread_wait(). If we call Join() in the static constructor the thread starts but doesn't run its body; so we aparently cannot use static constructors as the "begin" part of a module - the system doesn't seem to be working properly at this point.

rochus-keller commented 2 years ago

Fixed with commit 18e7d0dc2c69adbb; the issue was indeed related to the static constructor; apparently it is a bad idea to map Oberon module begin sections to static constructors; in the present case the thread system seems to be in an unknown state causing the strange observations. Since I moved module begin to a dedicated method called via import dependency chain NAppGUI threads work seamless with Mono.

The NAppGUI Fractals example now works fine on Linux and Windows when run from Mono. There is still an issue on macOS though; the NAppGUI app seems to hang in osmain just after window_show; will be traced in a separate issue.