microsoft / clrmd

Microsoft.Diagnostics.Runtime is a set of APIs for introspecting processes and dumps.
MIT License
1.06k stars 256 forks source link

Execute ClrHeap.EnumerateRoots stops process with exit code 139 #1234

Open Ne4to opened 10 months ago

Ne4to commented 10 months ago

I'm trying to inspect a memory dump taken from a process running .NET 7.0.10, macos-arm64 (third party app). When I try to iterate over ClrHeap.EnumerateRoots() my process exits without throwing any exception.

Console output:

Process finished with exit code 139.

Reproduced on latest Microsoft.Diagnostics.Runtime v3.1.456101 and v3.1.506101 from Azure DevOps public feed.

I tried to debug ClrMD and the process stops at the end of ReadVirtual method. https://github.com/microsoft/clrmd/blob/437022b361da20cf5f02d401a01c5e2c6c366097/src/Microsoft.Diagnostics.Runtime/DacInterface/DacDataTargetCOM.cs#L126-L134

Last variable values:

address = {ulong} 4577698576
buffer = {IntPtr} 0x145116330
bytesRead = {int} 8
bytesRequested = {int} 8
dacDataTarget = DacDataTarget
pBytesRead = {int*} 0x16e13a8ec
result = {bool} true
self = {IntPtr} 0x600002010258

Stack trace:

DacDataTargetCOM.IDacDataTargetVtbl.ReadVirtual() at /Users/ne4to/projects/github.com/Ne4to/clrmd/src/Microsoft.Diagnostics.Runtime/DacInterface/DacDataTargetCOM.cs:line 134
[Managed to Native Transition]
SOSStackRefEnum.Read() at /Users/ne4to/projects/github.com/Ne4to/clrmd/src/Microsoft.Diagnostics.Runtime/DacInterface/SosStackRefEnum.cs:line 54
SOSStackRefEnum.ReadStackRefs()
DacThreadHelpers.<EnumerateStackRoots>d__4.MoveNext()
Enumerable.SelectEnumerableIterator<StackRootInfo, ClrStackRoot>.MoveNext()
Enumerable.WhereEnumerableIterator<ClrStackRoot>.MoveNext()
ClrThread.<CacheAndReturnRoots>d__51.MoveNext()
ClrHeap.<EnumerateRoots>d__91.MoveNext()

<REDACTED>

ThreadPoolWorkQueue.Dispatch()
PortableThreadPool.WorkerThread.WorkerThreadStart()
Thread.StartCallback()
[Native to Managed Transition]

ClrMD is running in a .NET 8 process

dotnet --info output:

.NET SDK:
 Version:           8.0.101
 Commit:            6eceda187b
 Workload version:  8.0.100-manifests.69afb982

Runtime Environment:
 OS Name:     Mac OS X
 OS Version:  14.2
 OS Platform: Darwin
 RID:         osx-arm64
 Base Path:   /usr/local/share/dotnet/sdk/8.0.101/

.NET workloads installed:
 Workload version: 8.0.100-manifests.69afb982
 [aspire]
   Installation Source: SDK 8.0.100
   Manifest Version:    8.0.0-preview.1.23557.2/8.0.100
   Manifest Path:       /usr/local/share/dotnet/sdk-manifests/8.0.100/microsoft.net.sdk.aspire/8.0.0-preview.1.23557.2/WorkloadManifest.json
   Install Type:        FileBased

Host:
  Version:      8.0.1
  Architecture: arm64
  Commit:       bf5e279d92

.NET SDKs installed:
  6.0.412 [/usr/local/share/dotnet/sdk]
  6.0.414 [/usr/local/share/dotnet/sdk]
  7.0.306 [/usr/local/share/dotnet/sdk]
  7.0.401 [/usr/local/share/dotnet/sdk]
  7.0.403 [/usr/local/share/dotnet/sdk]
  8.0.100 [/usr/local/share/dotnet/sdk]
  8.0.101 [/usr/local/share/dotnet/sdk]

.NET runtimes installed:
  Microsoft.AspNetCore.App 6.0.20 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 6.0.22 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 7.0.9 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 7.0.11 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 7.0.13 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 8.0.0 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 8.0.1 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 6.0.20 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 6.0.22 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 7.0.9 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 7.0.11 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 7.0.13 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 8.0.0 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 8.0.1 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]

Other architectures found:
  None

Environment variables:
  DOTNET_ROOT       [/Users/ne4to/.dotnet]

global.json file:
  /Users/ne4to/projects/github.com/Ne4to/Heartbeat/global.json

Learn more:
  https://aka.ms/dotnet/info

Download .NET:
  https://aka.ms/dotnet/download
Ne4to commented 10 months ago

dotnet-dump has the same issue, as I understand it is based on ClrMD

> dotnet-dump --version
8.0.506901+b0ee5b9a01e571161bf772aa659440a986bbe532

> dotnet-dump analyze /Users/ne4to/projects/dbg/dumps/coredump.37588 --command 'dumpheap -stat -live'
Loading core dump: /Users/ne4to/projects/dbg/dumps/coredump.37588 ...
Calculating live objects, this may take a while...
Caching GC roots, this may take a while.
Subsequent runs of this command will be faster.

[1]    43914 segmentation fault  dotnet-dump analyze /Users/ne4to/projects/dbg/dumps/coredump.37588 --command
albahari commented 2 months ago

I get this error intermittently on .NET 8.0.7, after connecting to a process and repeatedly enumerating thread stacktraces. Because it takes down the process, it makes cldmd essentially useless on macOS.

Code Type:             ARM-64 (Native)
Responsible:           Terminal [460]
User ID:               0

Date/Time:             2024-09-21 12:30:10.1523 +0800
OS Version:            macOS 14.2 (23C64)
Report Version:        12
Anonymous UUID:        5108AE38-3CAC-7E60-22A6-7FEE6480CC01

Sleep/Wake UUID:       4903E49B-157A-4B2D-A6B3-A9AF183D7315

Time Awake Since Boot: 120000 seconds
Time Since Wake:       80158 seconds

System Integrity Protection: enabled

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x0000000000000000
Exception Codes:       0x0000000000000001, 0x0000000000000000

Termination Reason:    Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process:   exc handler [44112]

External Modification Warnings:
Process used task_for_pid().

VM Region Info: 0 is not in any region.  Bytes before following region: 4375576576
      REGION TYPE                    START - END         [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      UNUSED SPACE AT START
--->  
      __TEXT                      104ce0000-104cf0000    [   64K] r-x/r-x SM=COW  ...LINQPad.Query

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   libmscordaccore.dylib                  0x1615917c0 StackFrameIterator::CheckForSkippedFrames() + 112
1   libmscordaccore.dylib                  0x161591574 StackFrameIterator::NextRaw() + 740
2   libmscordaccore.dylib                  0x1615903b4 StackFrameIterator::Next() + 56
3   libmscordaccore.dylib                  0x16160a590 ClrDataStackWalk::Next() + 160
4   ???                                    0x10c89b51c ???
5   ???                                    0x10c89a788 ???
6   ???                                    0x10c89a22c ???
7   ???                                    0x10c3b534c ???
8   ???                                    0x10c6631fc ???