microsoft / clrmd

Microsoft.Diagnostics.Runtime is a set of APIs for introspecting processes and dumps.
MIT License
1.05k stars 254 forks source link

Dumping the stack of managed threads #762

Closed raffaeler closed 2 years ago

raffaeler commented 4 years ago

I often can't get the managed stack for the code being executed either by attaching a process or snapshotting it. The threads are enumerated correctly by enumerating ClrRuntime.Threads. For each one of them I then call EnumerateStackTrace obtaining the stack frames represented by a ClrStackFrame list. In very few cases I am able to see the real stack trace being executed at the snapshot/attach moment but, on those threads, in most of the cases I can only see a frame called DebuggerU2MCatchHandlerFrame which I believed was only a finalizer-related thing.

What is this object, and how can I 'decode' to get back the real stack?

I did my tests by running some heavy work in an asp.net core application, printing the managed thread id of the thread executing the cpy code for several tenths of seconds.

leculver commented 4 years ago

Is this on Windows or Linux that you are seeing this? Those are two very different codepaths.

raffaeler commented 4 years ago

I am seeing this on Windows x64, but I am interested in understanding the problem for other platforms as well. Thanks

leculver commented 4 years ago

There may be a bug here, or it may be expected. Here's a quick description of when it would be expected:

ClrMD uses clr's stackwalker to attempt to unwind the stack. By design, if the CLR stackwalker encounters a non-managed instruction pointer in the eip or rip register it assumes that we have normally exited managed code and put a clr!Frame on the stack. It will attempt to look for that clr!Frame object on the stack to help unwind and start the unwind from that object (note that clr!DebuggerU2MCatchHandlerFrame is a clr!Frame object).

It's possible that we can be "unlucky" and called into native code briefly before setting up a frame and that will confuse CLR's stackwalker. It happens regularly but we mostly assume that to be rare for most cases. However if you are seeing this a lot then there might be a bug here, or we might be mistaken in thinking it's rare. In order to find out we'd need to probably implement your suggestion in #760, to save off snapshots to see what the real stack is in WinDbg.

If there's not a bug here, I'm not sure there's much we can do to fix this. ClrMD isn't designed to function as a sampling profiler, it's meant to be a runtime inspection tool. What we are reporting to you is the stack as the CLR runtime sees it (again...that's assuming no bug in ClrMD), and what it sees at the time you took the snapshot is that [DebuggerU2MCatchHandlerFrame] was the top of the managed "stack".

I will try to reproduce this myself, and if I can't I'll see if we can implement #760 so we can get you to collect more information here. (Also FYI I just got back from a vacation so I'm a bit behind on work...it might be a few days before I get to this.)

raffaeler commented 4 years ago

Got it. Take your time and let me know if you need help to repro the issue. Let's say I try this operation 20 times, I can see the real stack just one or two times.

I suspect it is a bug because, even if I was working on a snapshot, enumerating two times the threads and the stack frames, I saw different results: the first time I could see the stack, the following times I could only see DebuggerU2MCatchHandlerFrame.

The repro should be pretty simple. In any case tomorrow morning I will present a session in an online conference about diagnostics and CLRMD is one of the pillars :) As soon as I publish the demo I can point you to my code but I suspect that it won't be that useful.

Thanks

raffaeler commented 4 years ago

Additional notes:

sukesh-ak commented 4 years ago

I got dumps with [DebuggerU2MCatchHandlerFrame] on the stack.

Atleast ClrMD & Windbg (using SOS) shows the same output.

ezsilmar commented 2 years ago

Hello, I'm facing the same issue on Linux.

We have a troubleshooting tool that uses CLRmd to show current "parallel stacks" of the application (connecting with dataTarget = DataTarget.AttachToProcess(pid, suspend = true);). In my reproduction the stacks are always correct for the threads that are waiting (Thread.Sleep, mre.WaitOne, etc). However, for the threads that are actively consuming CPU, there's only one frame like below

OS Thread ID: 965 Managed thread id: 7 Alive: True Aborted: False State: 16912928 IsWorkerThread: True IsIOCompletionThread: False
    Kind: Runtime Method:  FrameName: DebuggerU2MCatchHandlerFrame IP: 610053002E0000 MethodByIP:

It's very stable; even after 100 iterations of creating/destroying dataTarget I didn't catch a working stackwalk for the interesting thread.

When I take a full memory snapshot with dotnet-dump collect --type Full, the stacktrace is shown correctly

> clrstack
OS Thread Id: 0x965 (21)
        Child SP               IP Call Site
00007F9E9E9A2440 00007F9EA33F0FF7 Ghost.TestApps.BurnCpuComponent.Process(Ghost.TestApps.BurnCpuParam, System.Threading.CancellationToken) [C:\dev\repos\troubleshooting\app-hosting\ghost\Ghost\Ghost.TestApps\BurnCpu.cs @ 45]
00007F9E9E9A2490 00007F9EA33EE91A Ghost.Core.ComponentMiddleware`4+<>c__DisplayClass6_0+<<Invoke>g__ProcessRequest|0>d[[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib]].MoveNext() [C:\dev\repos\troubleshooting\app-hosting\ghost\Ghost\Ghost.Core\ComponentMiddleware.cs @ 75]
00007F9E9E9A2800 00007F9EA33ED002 System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Ghost.Core.ComponentMiddleware`4+<>c__DisplayClass6_0+<<Invoke>g__ProcessRequest|0>d[[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib]], Ghost.Core.netcore]](<<Invoke>g__ProcessRequest|0>d<System.__Canon,System.__Canon,System.__Canon,System.__Canon> ByRef) [/_/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncMethodBuilderCore.cs @ 38]
00007F9E9E9A2860 00007F9EA33ECEAF System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.ValueTuple`2[[System.__Canon, System.Private.CoreLib],[Ghost.Contract.InvocationResult, Ghost.Contract.netcore]], System.Private.CoreLib]].Start[[Ghost.Core.ComponentMiddleware`4+<>c__DisplayClass6_0+<<Invoke>g__ProcessRequest|0>d[[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib]], Ghost.Core.netcore]](<<Invoke>g__ProcessRequest|0>d<System.__Canon,System.__Canon,System.__Canon,System.__Canon> ByRef) [/_/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncTaskMethodBuilderT.cs @ 36]
00007F9E9E9A28A0 00007F9EA33ECA05 Ghost.Core.ComponentMiddleware`4+<>c__DisplayClass6_0[[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib]].<Invoke>g__ProcessRequest|0()
00007F9E9E9A2930 00007F9E9EE39C21 System.Threading.Tasks.Task`1[[System.__Canon, System.Private.CoreLib]].InnerInvoke() [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Future.cs @ 503]
00007F9E9E9A2950 00007F9EA2C1BE4E System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(System.Threading.Thread, System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs @ 268]
00007F9E9E9A2990 00007F9EA2C225B0 System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef, System.Threading.Thread) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs @ 2337]
00007F9E9E9A2A10 00007F9E9EC97F78 System.Threading.Tasks.Task.ExecuteEntryUnsafe(System.Threading.Thread) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs @ 2271]
00007F9E9E9A2A30 00007F9E9EC97F17 System.Threading.Tasks.Task.ExecuteFromThreadPool(System.Threading.Thread) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs @ 2262]
00007F9E9E9A2A40 00007F9EA1FA4FD3 System.Threading.ThreadPoolWorkQueue.Dispatch()
00007F9E9E9A2AB0 00007F9E9EC93BC5 System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart() [/_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs @ 63]
00007F9E9E9A2B80 00007F9E9EC78432 System.Threading.Thread.StartCallback() [/_/src/coreclr/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 105]
00007F9E9E9A2CF0 00007f9f18916a27 [DebuggerU2MCatchHandlerFrame: 00007f9e9e9a2cf0]

It also contains DebuggerU2MCatchHandlerFrame but with the different IP. When I open this snapshot with CLRmd it shows correct result (same as clrstack).

ezsilmar commented 2 years ago

I think I've found a bug that explains this behavior on Linux!

In LinuxLiveDataReader.cs there's a ptrace(PTRACE_ATTACH, pid, 0, 0) call to attach to the process. However, the ptrace manual says we should attach to every thread id separately. When I call PTRACE_ATTACH for every tid the stacks are read correctly.

Also DataTarget.CreateSnapshotAndAttach(pid) works fine on Linux.

leculver commented 2 years ago

When I call PTRACE_ATTACH for every tid the stacks are read correctly.

That's great! Can you submit a PR for this? Or even just a draft showing the code you changed? No worries if not, but it would speed things along.

ezsilmar commented 2 years ago

Sure, I'll make a PR tomorrow!

Interestingly it means the original issue observed on Windows is unrelated. I ran my code on Windows couple of times but could not reproduce.

ezsilmar commented 2 years ago

It seems my fix was not enough in the end: I reproduced the issue again and this time both on Windows and Linux.

First, I attach the repro code. Depending on the parameters, the program does 3 things:

In every environment I launch the program in test mode, then I launch another instance to show the stack traces and to inspect the dump I create with dotnet-dump collect -p <pid> --type Full. I expect that the stack traces reported from the dump and from the live session are the same. I also expect that stacktraces are reported correctly for every test case in the app.

That's the results I get:

               ManualResetEvent.Wait           Fibonacci numbers           DateTime.UtcNow
Linux clrmd live    correct                      correct after PR #991    empty stack
Linux clrmd dump    correct                      correct                  empty stack
Linux clrstack      correct                      correct                  empty stack / correct native part with -f in lldb
Linux dumpstack     correct                      correct                  correct in repro but wrong in real app
Windows clrmd live  correct                      correct                  works occasionally
Windows clrmd dump  correct                      correct                  works occasionally
Windows clrstack    correct                      correct                  works occasionally

The most interesting test case is the loop querying DateTime.UtcNow. I attach the full output for different cases.

On Linux clrstack always returns a single DebuggerU2MCatchHandlerFrame. In lldb clrstack -f skips the managed frames: i.e. native frames are correct, but there are empty lines where the managed frames are supposed to be. The dumpstack command always returns correct stack in repro but in the real app (http server doing DateTime.UtcNow loop on request) the managed part of the stack seems to contain the method that should not be there, so I guess the stackwalk is confused at some point.

On Windows running the test 10 times I'd get 2 times when it works and 8 times when it does not for both dump and live session. Weirdly yesterday the dump seemed to be correct more often, but today it's on par with the live session. Probably just luck.

I'd appreciate if you try the same tests on your machine to compare the results. I ran the tests on WSL Ubuntu with dotnet 5.0.9 and Windows 10 dotnet 5.0.17 (tried different dotnet versions too btw doesn't seem to change anything).

leculver commented 2 years ago

Ahh thank you for this repro. I actually think this is working "as expected" but what is expected here may be surprising to you. This is due to how the CLR diagnostics layer works. Let me first state a few facts about how the runtime and CLR Diagnostics (libmscordaccore.so) works, then I'll dig into this repro.

  1. ClrMD (and the standard clrstack command with no arguments) is using a "managed stack walk". This uses CLR's managed stackwalker. This does not do a "native stack unwind", that is, it does not do a stack unwind like the operating system or the debugger would do.
  2. ClrMD does a managed stack unwind (not a native one) because ClrMD is a CLR runtime inspection library. It's intended to show you the state of the runtime and threads exactly as CLR would see it if you paused any given thread at that location.
  3. "clrstack -f" is doing a native stack unwind using LLDB to help it unwind frames. This function is going outside of what the CLR Diagnostics library (libmscordaccore.so) down to either the operating system, LLDB, or a native stack unwinder to fully unwind that stack.
  4. By Design, the CLR stackwalker can only start walking the stack from a "Safe Point". That is, a location that the JIT has marked is allowable unwind from. Any time we start a stack unwind in native code, for example, that's an "unsafe point". When managed code calls into native code (either through a PInvoke or calling into libcoreclr.so), we have to create a "coreclr!Frame" on the stack. In your output you can see these Frames, such as DebuggerU2MCatchHandlerFrame, HelperMethodFrame, etc.
  5. When doing managed debugging in Visual Studio (but not in lldb, windbg, or stopping a random live process), that debugger specifically stops all threads at GC Safepoints using the ICorDebug interfaces. This ICorDebug pause of the process actually runs code in the target process to force everything to get to a GC safepoint. That's why you always see "clean" callstacks in VS. The tradeoff is that you aren't instantly stopping a process, you are letting it run further to reach a safepoint. lldb, dbgeng, and ClrMD's process attach simply halts the process immediately, wherever it was.

With all of that said, what you are seeing here is that "DateTime.get_UtcNow" actually uses a PInvoke or internal call into libcoreclr.so. If you happen to pause the thread after we have called into native code but before we've erected a clr!Frame, then we will "skip" the managed part of the stack (by design) because we don't know it exists! UtcNow might also be implemented in such a way that we intentionally don't create a clr!Frame object, because the call is supposed to be very quick...so we don't need to make that code interruptible.

So, this all is by design I think. Unless I missed something. If you want to get the full callstack, always, without missing any managed frames then you have a couple of options:

  1. Use a native stack unwinder like LLDB or whatever "clrstack -f" does to find the "missing" managed frames.
  2. Use ICorDebug to pause the process. This is somewhat expensive, and may have side effects, but that's how VS ensures a proper callstack because every thread will be forced to pause at stackwalking safepoints.

I'm going to close the issue, but feel free to reply back if you have more questions or think I missed something here. Thanks!