microsoft / testfx

MSTest framework and adapter
MIT License
713 stars 253 forks source link

Intermittent deadlock on Linux using native AoT support from console output #3485

Open martincostello opened 1 month ago

martincostello commented 1 month ago

Describe the bug

Today I've come back to the underlying issue that lead to me opening #3097 to try and work out what's going wrong. I think I've now got a fairly good idea of where the issue is (but not why) and have worked around the issue.

From trial and error in https://github.com/martincostello/alexa-london-travel/pull/1368, I managed to get hold of a dump on Linux under WSL and got the stacks for each thread: https://github.com/martincostello/alexa-london-travel/pull/1368#issuecomment-2271045338

The interesting stacks were the following:

  * frame #0: 0x00007f5424f34117 libc.so.6`___lldb_unnamed_symbol3457 + 231
    frame #1: 0x00007f5424f36a41 libc.so.6`pthread_cond_wait + 529
    frame #2: 0x00005647d8910dc4 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_WaitSubsystem_ThreadWaitInfo__Wait(this=<unavailable>, timeoutMilliseconds=<unavailable>, interruptible=<unavailable>, isSleep=<unavailable>, lockHolder=<unavailable>) at WaitSubsystem.ThreadWaitInfo.Unix.cs:322
    frame #3: 0x00005647d8911647 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_WaitSubsystem_WaitableObject__Wait(this=<unavailable>, waitInfo=<unavailable>, timeoutMilliseconds=<unavailable>, interruptible=<unavailable>, prioritize=<unavailable>) at WaitSubsystem.WaitableObject.Unix.cs:320
    frame #4: 0x00005647d886e75f LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_WaitHandle__WaitOneNoCheck(this=<unavailable>, millisecondsTimeout=<unavailable>) at WaitHandle.cs:128
    frame #5: 0x00005647d886e660 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_WaitHandle__WaitOne(this=<unavailable>, millisecondsTimeout=<unavailable>) at WaitHandle.cs:102
    frame #6: 0x00005647d886249a LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_Condition__Wait_1(this=<unavailable>, millisecondsTimeout=<unavailable>) at Condition.cs:114
    frame #7: 0x00005647d8862acf LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_Monitor__Wait(obj=<unavailable>, millisecondsTimeout=<unavailable>) at Monitor.NativeAot.cs:142
    frame #8: 0x00005647d8867e2c LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_ManualResetEventSlim__Wait_4(this=<unavailable>, millisecondsTimeout=<unavailable>, cancellationToken=<unavailable>) at ManualResetEventSlim.cs:561
    frame #9: 0x00005647d8873b3c LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_Tasks_Task__SpinThenBlockingWait(this=<unavailable>, millisecondsTimeout=<unavailable>, cancellationToken=<unavailable>) at Task.cs:3072
    frame #10: 0x00005647d887398b LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_Tasks_Task__InternalWaitCore(this=<unavailable>, millisecondsTimeout=<unavailable>, cancellationToken=<unavailable>) at Task.cs:3007
    frame #11: 0x00005647d88d0cd7 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Runtime_CompilerServices_TaskAwaiter__HandleNonSuccessAndDebuggerNotification(task=<unavailable>, options=<unavailable>) at TaskAwaiter.cs:104
    frame #12: 0x00005647d855299e LondonTravel.Skill.NativeAotTests at Logging.g.cs:68
    frame #13: 0x00005647d8e56f1c LondonTravel.Skill.NativeAotTests at ValueTuple.cs:1240
    frame #14: 0x00007f5424eccd90 libc.so.6`___lldb_unnamed_symbol3139 + 128
    frame #15: 0x00007f5424ecce40 libc.so.6`__libc_start_main + 128
    frame #16: 0x00005647d83c43d5 LondonTravel.Skill.NativeAotTests`_start + 37

  * frame #0: 0x00007f5424f34117 libc.so.6`___lldb_unnamed_symbol3457 + 231
    frame #1: 0x00007f5424f36a41 libc.so.6`pthread_cond_wait + 529
    frame #2: 0x00005647d8910dc4 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_WaitSubsystem_ThreadWaitInfo__Wait(this=<unavailable>, timeoutMilliseconds=<unavailable>, interruptible=<unavailable>, isSleep=<unavailable>, lockHolder=<unavailable>) at WaitSubsystem.ThreadWaitInfo.Unix.cs:322
    frame #3: 0x00005647d8911647 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_WaitSubsystem_WaitableObject__Wait(this=<unavailable>, waitInfo=<unavailable>, timeoutMilliseconds=<unavailable>, interruptible=<unavailable>, prioritize=<unavailable>) at WaitSubsystem.WaitableObject.Unix.cs:320
    frame #4: 0x00005647d886e75f LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_WaitHandle__WaitOneNoCheck(this=<unavailable>, millisecondsTimeout=<unavailable>) at WaitHandle.cs:128
    frame #5: 0x00005647d886e660 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_WaitHandle__WaitOne(this=<unavailable>, millisecondsTimeout=<unavailable>) at WaitHandle.cs:102
    frame #6: 0x00005647d8861f60 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_Lock__TryAcquireSlow(this=<unavailable>, currentThreadId=<unavailable>, millisecondsTimeout=<unavailable>, trackContentions=<unavailable>) at Lock.cs:355
    frame #7: 0x00005647d8862c4f LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_Monitor__TryAcquireSlow(lck=<unavailable>, obj=<unavailable>, millisecondsTimeout=<unavailable>) at Monitor.NativeAot.cs:168
    frame #8: 0x00005647d88627a7 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_Monitor__Enter(obj=<unavailable>) at Monitor.NativeAot.cs:52
    frame #9: 0x00005647d86c4d85 LondonTravel.Skill.NativeAotTests`System_Console_System_ConsolePal__RefreshColors(toChange=<unavailable>, value=<unavailable>) at ConsolePal.Unix.cs:767
    frame #10: 0x00005647d8648ae2 LondonTravel.Skill.NativeAotTests`Microsoft_Testing_Platform_Microsoft_Testing_Platform_OutputDevice_ConsoleOutputDevice__DisplayAfterSessionEndRunAsync_d__45__MoveNext(this=<unavailable>) at ConsoleDisplayService.cs:258
    frame #11: 0x00005647d8e87412 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Runtime_CompilerServices_AsyncMethodBuilderCore__Start<Microsoft_Testing_Platform_Microsoft_Testing_Platform_OutputDevice_ConsoleOutputDevice__DisplayAfterSessionEndRunAsync_d__45>(stateMachine=<unavailable>) at AsyncMethodBuilderCore.cs:38
    frame #12: 0x00005647d861e5fc LondonTravel.Skill.NativeAotTests at ConsoleDisplayService.cs:116
    frame #13: 0x00005647d8652c14 LondonTravel.Skill.NativeAotTests`Microsoft_Testing_Platform_Microsoft_Testing_Platform_Hosts_CommonTestHost__DisplayAfterSessionEndRunAsync_d__10__MoveNext(this=<unavailable>) at CommonTestHost.cs:105
    frame #14: 0x00005647d8e8d3f2 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Runtime_CompilerServices_AsyncMethodBuilderCore__Start<Microsoft_Testing_Platform_Microsoft_Testing_Platform_Hosts_CommonTestHost__DisplayAfterSessionEndRunAsync_d__10>(stateMachine=<unavailable>) at AsyncMethodBuilderCore.cs:38
    frame #15: 0x00005647d86247a9 LondonTravel.Skill.NativeAotTests at CommonTestHost.cs:18
    frame #16: 0x00005647d865386e LondonTravel.Skill.NativeAotTests`Microsoft_Testing_Platform_Microsoft_Testing_Platform_Hosts_CommonTestHost__ExecuteRequestAsync_d__8__MoveNext(this=<unavailable>) at CommonTestHost.cs:86
    frame #17: 0x00005647d8866b6c LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_ExecutionContext__RunInternal(executionContext=<unavailable>, callback=<unavailable>, state=<unavailable>) at ExecutionContext.cs:179
    frame #18: 0x00005647d8c082ad LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Runtime_CompilerServices_AsyncTaskMethodBuilder_1_AsyncStateMachineBox_1<S_P_CoreLib_System_Threading_Tasks_VoidTaskResult__Microsoft_Testing_Platform_Microsoft_Testing_Platform_Hosts_CommonTestHost__ExecuteRequestAsync_d__8>__MoveNext_0(this=<unavailable>, threadPoolThread=<unavailable>) at AsyncTaskMethodBuilderT.cs:368
    frame #19: 0x00005647d88769fa LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_Tasks_AwaitTaskContinuation__RunOrScheduleAction_0(box=<unavailable>, allowInlining=<unavailable>) at TaskContinuation.cs:795
    frame #20: 0x00005647d88742d9 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_Tasks_Task__RunContinuations(this=<unavailable>, continuationObject=<unavailable>) at Task.cs:3456
    frame #21: 0x00005647d8bf47a8 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_Tasks_Task_1<S_P_CoreLib_System_Threading_Tasks_VoidTaskResult>__TrySetResult(this=<unavailable>, result=<unavailable>) at Future.cs:398
    frame #22: 0x00005647d8bf52e0 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Runtime_CompilerServices_AsyncTaskMethodBuilder_1<S_P_CoreLib_System_Threading_Tasks_VoidTaskResult>__SetExistingTaskResult(task=<unavailable>, result=<unavailable>) at AsyncTaskMethodBuilderT.cs:490
    frame #23: 0x00005647d8653dd9 LondonTravel.Skill.NativeAotTests`Microsoft_Testing_Platform_Microsoft_Testing_Platform_Hosts_CommonTestHost__NotifyTestSessionEndAsync_d__12__MoveNext(this=<unavailable>) at CommonTestHost.cs:155
    frame #24: 0x00005647d8866b6c LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_ExecutionContext__RunInternal(executionContext=<unavailable>, callback=<unavailable>, state=<unavailable>) at ExecutionContext.cs:179
    frame #25: 0x00005647d8c084ed LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Runtime_CompilerServices_AsyncTaskMethodBuilder_1_AsyncStateMachineBox_1<S_P_CoreLib_System_Threading_Tasks_VoidTaskResult__Microsoft_Testing_Platform_Microsoft_Testing_Platform_Hosts_CommonTestHost__NotifyTestSessionEndAsync_d__12>__MoveNext_0(this=<unavailable>, threadPoolThread=<unavailable>) at
 AsyncTaskMethodBuilderT.cs:368
    frame #26: 0x00005647d88769fa LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_Tasks_AwaitTaskContinuation__RunOrScheduleAction_0(box=<unavailable>, allowInlining=<unavailable>) at TaskContinuation.cs:795
    frame #27: 0x00005647d88742d9 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_Tasks_Task__RunContinuations(this=<unavailable>, continuationObject=<unavailable>) at Task.cs:3456
    frame #28: 0x00005647d8bf47a8 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_Tasks_Task_1<S_P_CoreLib_System_Threading_Tasks_VoidTaskResult>__TrySetResult(this=<unavailable>, result=<unavailable>) at Future.cs:398
    frame #29: 0x00005647d8bf52e0 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Runtime_CompilerServices_AsyncTaskMethodBuilder_1<S_P_CoreLib_System_Threading_Tasks_VoidTaskResult>__SetExistingTaskResult(task=<unavailable>, result=<unavailable>) at AsyncTaskMethodBuilderT.cs:490
    frame #30: 0x00005647d864eddf LondonTravel.Skill.NativeAotTests`Microsoft_Testing_Platform_Microsoft_Testing_Platform_Messages_MessageBusProxy__DisableAsync_d__7__MoveNext(this=<unavailable>) at MessageBusProxy.cs:47
    frame #31: 0x00005647d8866b6c LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_ExecutionContext__RunInternal(executionContext=<unavailable>, callback=<unavailable>, state=<unavailable>) at ExecutionContext.cs:179
    frame #32: 0x00005647d8c054ad LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Runtime_CompilerServices_AsyncTaskMethodBuilder_1_AsyncStateMachineBox_1<S_P_CoreLib_System_Threading_Tasks_VoidTaskResult__Microsoft_Testing_Platform_Microsoft_Testing_Platform_Messages_MessageBusProxy__DisableAsync_d__7>__MoveNext_0(this=<unavailable>, threadPoolThread=<unavailable>) at AsyncTaskMethodBuilderT.cs:368
    frame #33: 0x00005647d88769fa LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_Tasks_AwaitTaskContinuation__RunOrScheduleAction_0(box=<unavailable>, allowInlining=<unavailable>) at TaskContinuation.cs:795
    frame #34: 0x00005647d88742d9 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_Tasks_Task__RunContinuations(this=<unavailable>, continuationObject=<unavailable>) at Task.cs:3456
    frame #35: 0x00005647d8bf47a8 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_Tasks_Task_1<S_P_CoreLib_System_Threading_Tasks_VoidTaskResult>__TrySetResult(this=<unavailable>, result=<unavailable>) at Future.cs:398
    frame #36: 0x00005647d8bf52e0 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Runtime_CompilerServices_AsyncTaskMethodBuilder_1<S_P_CoreLib_System_Threading_Tasks_VoidTaskResult>__SetExistingTaskResult(task=<unavailable>, result=<unavailable>) at AsyncTaskMethodBuilderT.cs:490
    frame #37: 0x00005647d864c554 LondonTravel.Skill.NativeAotTests`Microsoft_Testing_Platform_Microsoft_Testing_Platform_Messages_AsynchronousMessageBus__DisableAsync_d__18__MoveNext(this=<unavailable>) at AsynchronousMessageBus.cs:204
    frame #38: 0x00005647d8866b6c LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_ExecutionContext__RunInternal(executionContext=<unavailable>, callback=<unavailable>, state=<unavailable>) at ExecutionContext.cs:179
    frame #39: 0x00005647d8c0412d LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Runtime_CompilerServices_AsyncTaskMethodBuilder_1_AsyncStateMachineBox_1<S_P_CoreLib_System_Threading_Tasks_VoidTaskResult__Microsoft_Testing_Platform_Microsoft_Testing_Platform_Messages_AsynchronousMessageBus__DisableAsync_d__18>__MoveNext_0(this=<unavailable>, threadPoolThread=<unavailable>) at AsyncTaskMethodBuilderT.cs:368
    frame #40: 0x00005647d88769fa LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_Tasks_AwaitTaskContinuation__RunOrScheduleAction_0(box=<unavailable>, allowInlining=<unavailable>) at TaskContinuation.cs:795
    frame #41: 0x00005647d88742d9 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_Tasks_Task__RunContinuations(this=<unavailable>, continuationObject=<unavailable>) at Task.cs:3456
    frame #42: 0x00005647d8bf47a8 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_Tasks_Task_1<S_P_CoreLib_System_Threading_Tasks_VoidTaskResult>__TrySetResult(this=<unavailable>, result=<unavailable>) at Future.cs:398
    frame #43: 0x00005647d8bf52e0 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Runtime_CompilerServices_AsyncTaskMethodBuilder_1<S_P_CoreLib_System_Threading_Tasks_VoidTaskResult>__SetExistingTaskResult(task=<unavailable>, result=<unavailable>) at AsyncTaskMethodBuilderT.cs:490
    frame #44: 0x00005647d864d905 LondonTravel.Skill.NativeAotTests`Microsoft_Testing_Platform_Microsoft_Testing_Platform_Messages_AsyncConsumerDataProcessor__CompleteAddingAsync_d__13__MoveNext(this=<unavailable>) at ChannelConsumerDataProcessor.cs:123
    frame #45: 0x00005647d8866b6c LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_ExecutionContext__RunInternal(executionContext=<unavailable>, callback=<unavailable>, state=<unavailable>) at ExecutionContext.cs:179
    frame #46: 0x00005647d8c04c0d LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Runtime_CompilerServices_AsyncTaskMethodBuilder_1_AsyncStateMachineBox_1<S_P_CoreLib_System_Threading_Tasks_VoidTaskResult__Microsoft_Testing_Platform_Microsoft_Testing_Platform_Messages_AsyncConsumerDataProcessor__CompleteAddingAsync_d__13>__MoveNext_0(this=<unavailable>, threadPoolThread=<unavailable>) at AsyncTaskMethodBuilderT.cs:368
    frame #47: 0x00005647d88769fa LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_Tasks_AwaitTaskContinuation__RunOrScheduleAction_0(box=<unavailable>, allowInlining=<unavailable>) at TaskContinuation.cs:795
    frame #48: 0x00005647d88742d9 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_Tasks_Task__RunContinuations(this=<unavailable>, continuationObject=<unavailable>) at Task.cs:3456
    frame #49: 0x00005647d8bf47a8 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_Tasks_Task_1<S_P_CoreLib_System_Threading_Tasks_VoidTaskResult>__TrySetResult(this=<unavailable>, result=<unavailable>) at Future.cs:398
    frame #50: 0x00005647d8bf4dc3 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_Tasks_UnwrapPromise_1<S_P_CoreLib_System_Threading_Tasks_VoidTaskResult>__TrySetFromTask(this=<unavailable>, task=<unavailable>, lookForOce=<unavailable>) at Task.cs:7139
    frame #51: 0x00005647d8bf4aa8 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_Tasks_UnwrapPromise_1<S_P_CoreLib_System_Threading_Tasks_VoidTaskResult>__InvokeCore(this=<unavailable>, completingTask=<unavailable>) at Task.cs:7046
    frame #52: 0x00005647d887430d LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_Tasks_Task__RunContinuations(this=<unavailable>, continuationObject=<unavailable>) at Task.cs:3474
    frame #53: 0x00005647d8bf47a8 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_Tasks_Task_1<S_P_CoreLib_System_Threading_Tasks_VoidTaskResult>__TrySetResult(this=<unavailable>, result=<unavailable>) at Future.cs:398
    frame #54: 0x00005647d8bf52e0 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Runtime_CompilerServices_AsyncTaskMethodBuilder_1<S_P_CoreLib_System_Threading_Tasks_VoidTaskResult>__SetExistingTaskResult(task=<unavailable>, result=<unavailable>) at AsyncTaskMethodBuilderT.cs:490
    frame #55: 0x00005647d864ddd5 LondonTravel.Skill.NativeAotTests`Microsoft_Testing_Platform_Microsoft_Testing_Platform_Messages_AsyncConsumerDataProcessor__ConsumeAsync_d__12__MoveNext(this=<unavailable>) at ChannelConsumerDataProcessor.cs:113
    frame #56: 0x00005647d8866b6c LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_ExecutionContext__RunInternal(executionContext=<unavailable>, callback=<unavailable>, state=<unavailable>) at ExecutionContext.cs:179
    frame #57: 0x00005647d8c04e0d LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Runtime_CompilerServices_AsyncTaskMethodBuilder_1_AsyncStateMachineBox_1<S_P_CoreLib_System_Threading_Tasks_VoidTaskResult__Microsoft_Testing_Platform_Microsoft_Testing_Platform_Messages_AsyncConsumerDataProcessor__ConsumeAsync_d__12>__MoveNext_0(this=<unavailable>, threadPoolThread=<unavailable>) at AsyncTaskMethodBuilderT.cs:368
    frame #58: 0x00005647d886c18c LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_ThreadPoolWorkQueue__Dispatch at ThreadPoolWorkQueue.cs:913
    frame #59: 0x00005647d8910317 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_PortableThreadPool_WorkerThread__WorkerThreadStart at PortableThreadPool.WorkerThread.NonBrowser.cs:102
    frame #60: 0x00005647d8863e0e LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_Thread__StartThread(parameter=<unavailable>) at Thread.NativeAot.cs:447
    frame #61: 0x00005647d88642b9 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_Thread__ThreadEntryPoint(parameter=<unavailable>) at Thread.NativeAot.Unix.cs:113
    frame #62: 0x00007f5424f37ac3 libc.so.6`___lldb_unnamed_symbol3481 + 755
    frame #63: 0x00007f5424fc9850 libc.so.6`___lldb_unnamed_symbol3866 + 11

  * frame #0: 0x00007f5424f34117 libc.so.6`___lldb_unnamed_symbol3457 + 231
    frame #1: 0x00007f5424f36a41 libc.so.6`pthread_cond_wait + 529
    frame #2: 0x00005647d8910dc4 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_WaitSubsystem_ThreadWaitInfo__Wait(this=<unavailable>, timeoutMilliseconds=<unavailable>, interruptible=<unavailable>, isSleep=<unavailable>, lockHolder=<unavailable>) at WaitSubsystem.ThreadWaitInfo.Unix.cs:322
    frame #3: 0x00005647d8911647 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_WaitSubsystem_WaitableObject__Wait(this=<unavailable>, waitInfo=<unavailable>, timeoutMilliseconds=<unavailable>, interruptible=<unavailable>, prioritize=<unavailable>) at WaitSubsystem.WaitableObject.Unix.cs:320
    frame #4: 0x00005647d886e75f LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_WaitHandle__WaitOneNoCheck(this=<unavailable>, millisecondsTimeout=<unavailable>) at WaitHandle.cs:128
    frame #5: 0x00005647d886e660 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_WaitHandle__WaitOne(this=<unavailable>, millisecondsTimeout=<unavailable>) at WaitHandle.cs:102
    frame #6: 0x00005647d8861f60 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_Lock__TryAcquireSlow(this=<unavailable>, currentThreadId=<unavailable>, millisecondsTimeout=<unavailable>, trackContentions=<unavailable>) at Lock.cs:355
    frame #7: 0x00005647d8862c4f LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_Monitor__TryAcquireSlow(lck=<unavailable>, obj=<unavailable>, millisecondsTimeout=<unavailable>) at Monitor.NativeAot.cs:168
    frame #8: 0x00005647d88fb042 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_Internal_Runtime_CompilerHelpers_SynchronizedMethodHelpers__MonitorEnter(obj=<unavailable>, lockTaken=<unavailable>) at SynchronizedMethodHelpers.cs:28
    frame #9: 0x00005647d891c07f LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_IO_TextWriter_SyncTextWriter__Write_11(this=<unavailable>, value=<unavailable>) at TextWriter.cs:814
    frame #10: 0x00005647d85f6efc LondonTravel.Skill.NativeAotTests at ResilienceHandler.cs:91
    frame #11: 0x00005647d85f6f78 LondonTravel.Skill.NativeAotTests at ResilienceHandler.cs:91
    frame #12: 0x00005647d8863e0e LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_Thread__StartThread(parameter=<unavailable>) at Thread.NativeAot.cs:447
    frame #13: 0x00005647d88642b9 LondonTravel.Skill.NativeAotTests`S_P_CoreLib_System_Threading_Thread__ThreadEntryPoint(parameter=<unavailable>) at Thread.NativeAot.Unix.cs:113
    frame #14: 0x00007f5424f37ac3 libc.so.6`___lldb_unnamed_symbol3481 + 755
    frame #15: 0x00007f5424fc9850 libc.so.6`___lldb_unnamed_symbol3866 + 11

It looks like there's a deadlock between trying to write to the console via ILogger in the code under test and the MSTest platform trying to change the colour of the console at the end of the test run before reporting the test summary.

It may be that this is an issue in .NET rather than VSTest, but I figured I'd start here in the first instance as I only seem to get this problem using the native AoT test support.

Some interesting points in the code are listed below:

  1. ConsoleDisplayService.DisplayAfterSessionEndRunAsync() sets the console colour just before writing the test summary.
  2. This in turn calls SystemConsole.SetForegroundColor() to set Console.ForegroundColor.
  3. On Linux, setting Console.ForegroundColor calls through to ConsolePal.RefreshColors().
  4. RefreshColors() locks on Console.Out.
  5. Another thread shows SyncTextWriter.Write() - this uses [MethodImpl(MethodImplOptions.Synchronized)] on the methods, so I think that's effectively also locking on Console.Out.

The fact that only the Linux implementation of the setter of Console.ForegroundColor appears to do any locking would also help explain why I only see this on Linux.

I've worked around this issue by reducing the amount of console output from the tests. With the workaround applied, the tests happily keep passing if I run them continuously in a loop (until the job timeout limit of 20 minutes is reached).

However, what I'm at a loss to explain is why there's a deadlock - just that there appears to be one on usage of the console.

Steps To Reproduce

  1. Clone https://github.com/martincostello/alexa-london-travel/commit/ecd4a728a6bf40d19f767dba22cc5756023cfc06 on Linux or in WSL.
  2. Run build.ps1
  3. After a number of invocations (if not the first), the native AoT tests should hang. Ctrl+C does not succeed in ending the process either.

Expected behavior

The tests pass and exit successfully.

Actual behavior

The tests deadlock.

MarcoRossignoli commented 1 month ago

@jkotas do you know who owns System.Console? We see an issue only on native aot mode related to some writeline.

MarcoRossignoli commented 1 month ago

@Evangelink @nohwnd do you recall if some code inside MSTest is doing some wild lock on Console.Out?

MarcoRossignoli commented 1 month ago

@martincostello ResilienceHandler.cs is your code?

martincostello commented 1 month ago

@MarcoRossignoli I believe it's this: https://github.com/dotnet/extensions/blob/main/src/Libraries/Microsoft.Extensions.Http.Resilience/Resilience/ResilienceHandler.cs

jkotas commented 1 month ago

@jkotas do you know who owns System.Console?

You can find dotnet/rutime area owners in https://github.com/dotnet/runtime/blob/main/docs/area-owners.md

However, this does not look like a System.Console bug to me. The stacktraces do not explain why there is a deadlock. It is more likely a bug in native AOT runtime . Have you tried to run latest .NET 9 Preview? It may be a bug that was fixed in .NET 9.

martincostello commented 1 day ago

While trying to work out why my Lambda function isn't working when deployed to AWS Lambda with RC.1, I managed to by-accident generate a crash dump on both Linux and macOS. The Linux one showed a deadlock on two threads trying to do a Console.WriteLine():

image

image

I can share a link to the crash dump file over email. This was in normal xunit, rather than anything to do with AoT, but the commonality appears to be access to the console.

jkotas commented 1 day ago

This is likely duplicate of https://github.com/dotnet/runtime/issues/104340 that will be fixed in .NET 9 RC2.