microsoft / clrmd

Microsoft.Diagnostics.Runtime is a set of APIs for introspecting processes and dumps.
MIT License
1.05k stars 255 forks source link

CLRMD apparently missing inner classes when demangling names for types #806

Closed ryanmolden closed 3 years ago

ryanmolden commented 4 years ago
using System;
using System.Collections.Generic;
using Microsoft.Diagnostics.Runtime;

namespace Playground
{
    class Program
    {
        public static void Main(string[] args)
        {
            DataTarget dataTarget = DataTarget.LoadDump(@"c:\temp\crashdumps\ProcessingCrash1.dmp");
            ClrInfo clrInfo = dataTarget.ClrVersions[0];
            ClrRuntime runtime = clrInfo.CreateRuntime();

            ClrType type1 = runtime.GetTypeByMethodTable(0x1b420104);
            ClrType type2 = runtime.GetTypeByMethodTable(0x1b43815c);
            ClrType type3 = runtime.GetTypeByMethodTable(0x1b438438);

            Console.WriteLine($"CLRMD: 0x{type1.MethodTable:X} = {type1.Name}");
            Console.WriteLine($"  SOS: 0x{type1.MethodTable:X} = System.Collections.Immutable.ImmutableDictionary`2[[System.IntPtr, mscorlib],[System.Tuple`2[[Microsoft.VisualStudio.ProjectSystem.Designers.UnattachedProjectTreeNode, Microsoft.VisualStudio.ProjectSystem.Implementation],[System.IntPtr, mscorlib]], mscorlib]]");
            Console.WriteLine();

            Console.WriteLine($"CLRMD: 0x{type2.MethodTable:X} = {type2.Name}");
            Console.WriteLine($"  SOS: 0x{type2.MethodTable:X} = System.Collections.Immutable.ImmutableDictionary`2+Comparers[[System.IntPtr, mscorlib],[System.Tuple`2[[Microsoft.VisualStudio.ProjectSystem.Designers.UnattachedProjectTreeNode, Microsoft.VisualStudio.ProjectSystem.Implementation],[System.IntPtr, mscorlib]], mscorlib]]");
            Console.WriteLine();

            Console.WriteLine($"CLRMD: 0x{type3.MethodTable:X} = {type3.Name}");
            Console.WriteLine($"  SOS: 0x{type3.MethodTable:X} = System.Collections.Immutable.ImmutableDictionary`2+c[[System.IntPtr, mscorlib],[System.Tuple`2[[Microsoft.VisualStudio.ProjectSystem.Designers.UnattachedProjectTreeNode, Microsoft.VisualStudio.ProjectSystem.Implementation],[System.IntPtr, mscorlib]], mscorlib]]");
            Console.WriteLine();

            bool allThreeCLRMDNamesMatch = type1.Name == type2.Name && type2.Name == type3.Name;
            Console.WriteLine($"All three CLRMD type names match: {allThreeCLRMDNamesMatch}");
        }
    }
}

Output of last WriteLine is true that all three distinct ClrType (distinct by method table) name's match, but looking at the same method tables in SOS using !dumpmt they do not match.

I will send you an email with the dump location for the dump used in the repro above.

ryanmolden commented 4 years ago

Also add this to the 'difference' list, not sure who is right here, CLRMD or SOS (method table is 0x2B5DE20C, same dump)

CLRMD: 0x2B5DE20C = System.Linq.Enumerable+\<ReverseIterator>d75 SOS: 0x2B5DE20C = System.Linq.Enumerable+d75`1[[System.Object, mscorlib]]

It looks like it is the auto-created object for a yield enumerable inside the static Enumerable.ReverseIterator method, so I guess CLRMD is right in leaving in the bit, but SOS strips it out, which makes it slightly easier to reason about the generics in the signature, since isn't a generic type, but here T is Object (it appears) to System.Linq.Enumerable+d__75[[System.Object,mscorlib]] seems a bit 'more right' in being less stringently exact. If that makes any sense :P

leculver commented 4 years ago

Here's a standalone repro:

using Microsoft.Diagnostics.Runtime;
using Microsoft.Diagnostics.Runtime.DacInterface;
using System;
using System.Collections.Generic;
using System.Diagnostics.CodeAnalysis;
using System.IO;
using System.Linq;
using System.Text;

class Program
{
    static void Main(string[] _)
    {
        using DataTarget dt = DataTarget.LoadDump(@"C:\work\vs\VsLowMemDump4.dmp");
        using ClrRuntime runtime = dt.ClrVersions.Single().CreateRuntime();
        using SOSDac dac = runtime.DacLibrary.SOSDacInterface;

        HashSet<string> seen = new HashSet<string>();

        using var output = File.CreateText("output.txt");
        foreach (ClrModule module in runtime.EnumerateModules())
        {
            foreach ((ulong mt, int _) in module.EnumerateTypeDefToMethodTableMap())
            {
                string name = dac.GetMethodTableName(mt);
                if (name is null)
                    continue;

                if (!seen.Contains(name))
                {
                    string result = FixGenerics(name);
                    output.WriteLine(result);
                }
            }
        }
    }

    private static void FixGenerics(StringBuilder result, string name, int start, int len, ref int maxDepth, out int finish)
    {
        if (--maxDepth < 0)
        {
            finish = 0;
            return;
        }

        int i = start;
        while (i < len)
        {
            if (name[i] == '`')
                while (i < len && name[i] != '[')
                    i++;

            if (name[i] == ',')
            {
                finish = i;
                return;
            }

            if (name[i] == '[')
            {
                int end = FindEnd(name, i);

                if (IsArraySubstring(name, i, end))
                {
                    result.Append(name, i, end - i + 1);
                    i = end + 1;
                }
                else
                {
                    result.Append('<');

                    int curr = i;
                    do
                    {
                        FixGenerics(result, name, curr + 2, end - 1, ref maxDepth, out int currEnd);
                        if (maxDepth < 0)
                        {
                            finish = 0;
                            return;
                        }

                        curr = FindEnd(name, currEnd) + 1;

                        if (curr >= end)
                            break;

                        if (name[curr] == ',')
                            result.Append(", ");
                    }
                    while (curr < end);

                    result.Append('>');

                    i = curr + 1;
                }
            }
            else
            {
                result.Append(name[i]);
                i++;
            }
        }

        finish = i;
    }

    private static int FindEnd(string name, int start)
    {
        int parenCount = 1;
        for (int i = start + 1; i < name.Length; i++)
        {
            if (name[i] == '[')
                parenCount++;
            if (name[i] == ']' && --parenCount == 0)
            {
                return i;
            }
        }

        return -1;
    }

    private static bool IsArraySubstring(string name, int start, int end)
    {
        start++;
        end--;
        while (start < end)
            if (name[start++] != ',')
                return false;

        return true;
    }

    [return: NotNullIfNotNull("name")]
    public static string? FixGenerics(string? name)
    {
        if (name == null || !name.Contains("[[", StringComparison.Ordinal))
            return name;

        int maxDepth = 64;
        StringBuilder sb = new StringBuilder();
        FixGenerics(sb, name, 0, name.Length, ref maxDepth, out _);

        if (maxDepth < 0)
            return null;

        return sb.ToString();
    }
}
ryanmolden commented 4 years ago

So I have something that seems to work, it fixes things like this (old output)

System.Collections.Generic.Dictionary<System.String, System.Int64>[]

And correctly outputs this

System.Collections.Generic.Dictionary<System.String, System.Int64>+Entry[]

But I also noticed that sometimes the DAC doesn't give us generic params for generic types, it results in things like this (old output)

NuGet.Shared.EqualityUtility+<>c__DisplayClass3_0`2

This is a lambda generated within a generic method which has two two type params, so the `2 at the end is right, but the DAC doesn't give any type params.

Currently my parser outputs this

NuGet.Shared.EqualityUtility+<>c__DisplayClass3_0<??, ??>

The <??,??> is kind of arbitrary, its just an attempt and not 'losing' the genericness/arity of the type, even though we have no params to fill in.

I am not 100% on this, technically if anyone has finicky parsing logic around the names CLRMD returns (not unlikely) it seems this would break that.

Thoughts @leculver ? Should I just leave these as is (i.e.: NuGet.Shared.EqualityUtility+<>c__DisplayClass3_0`2)?

leculver commented 4 years ago

Go ahead and post the PR even if it's not ready to merge. (Mark as Draft if it's not ready.)

NuGet.Shared.EqualityUtility+<>c__DisplayClass3_0<??, ??>

I like the concept of this, I'm not sure about using ?? specifically. Can you define that ?? as a constant to easily change it?

leculver commented 4 years ago

Actually, I think using numbered arguments would be better: T1, T2, etc... It at least holds the intent without being something that breaks the naming scheme of CLR.

ryanmolden commented 4 years ago

Yeah it is already defined as a constant, subbing in the T1,T2 stuff should be easy enough, let me do that.

ryanmolden commented 4 years ago

Examples of differences

Names where generic params were missing entirely:

Input: Roslyn.Utilities.OrderedMultiDictionary`2

OLD: Roslyn.Utilities.OrderedMultiDictionary`2

NEW: Roslyn.Utilities.OrderedMultiDictionary<T1, T2>

Non-generic nested types within generic outer classes:

Input: System.Lazy`1+Boxed[[Microsoft.CodeAnalysis.Options.IDocumentOptionsProviderFactory, Microsoft.CodeAnalysis.Workspaces]]

OLD: System.Lazy

NEW: System.Lazy+Boxed

Input: Microsoft.CodeAnalysis.Shared.Collections.IntervalTree`1+Node[[Microsoft.CodeAnalysis.Formatting.SuppressWrappingData, Microsoft.CodeAnalysis.Workspaces]][]

OLD: Microsoft.CodeAnalysis.Shared.Collections.IntervalTree[]

NEW: Microsoft.CodeAnalysis.Shared.Collections.IntervalTree+Node[]

Generic nested type with generic outer classes (note how the old output, in addition to missing the nested generic class, actually says WeakKeyDictionary has three generic params, which is not true):

Input: System.Collections.Generic.Dictionary2[[Microsoft.VisualStudio.Threading.WeakKeyDictionary2+WeakReference`1[[Microsoft.VisualStudio.Threading.IJoinableTaskDependent, Microsoft.VisualStudio.Threading],[System.Int32, mscorlib],[Microsoft.VisualStudio.Threading.IJoinableTaskDependent, Microsoft.VisualStudio.Threading]], Microsoft.VisualStudio.Threading],[System.Int32, mscorlib]]

OLD: System.Collections.Generic.Dictionary<Microsoft.VisualStudio.Threading.WeakKeyDictionary<Microsoft.VisualStudio.Threading.IJoinableTaskDependent, System.Int32, Microsoft.VisualStudio.Threading.IJoinableTaskDependent>, System.Int32>

NEW: System.Collections.Generic.Dictionary<Microsoft.VisualStudio.Threading.WeakKeyDictionary<Microsoft.VisualStudio.Threading.IJoinableTaskDependent, System.Int32>+WeakReference, System.Int32>

Nested class hell (i.e. multiple nested classes amongst arguments)

Input: Microsoft.CodeAnalysis.PooledObjects.ObjectPool1[[System.Collections.Generic.Stack1[[System.ValueTuple2[[Microsoft.CodeAnalysis.Shared.Collections.IntervalTree1+Node[[Microsoft.CodeAnalysis.Editor.Shared.Tagging.TagSpanIntervalTree`1+TagNode[[Microsoft.VisualStudio.Text.Tagging.IErrorTag, Microsoft.VisualStudio.Text.UI]], Microsoft.CodeAnalysis.EditorFeatures]], Microsoft.CodeAnalysis.Workspaces],[System.Boolean, mscorlib]], mscorlib]], System]]

OLD: Microsoft.CodeAnalysis.PooledObjects.ObjectPool<System.Collections.Generic.Stack<System.ValueTuple<Microsoft.CodeAnalysis.Shared.Collections.IntervalTree<Microsoft.CodeAnalysis.Editor.Shared.Tagging.TagSpanIntervalTree>, System.Boolean>>>

NEW: Microsoft.CodeAnalysis.PooledObjects.ObjectPool<System.Collections.Generic.Stack<System.ValueTuple<Microsoft.CodeAnalysis.Shared.Collections.IntervalTree<Microsoft.CodeAnalysis.Editor.Shared.Tagging.TagSpanIntervalTree+TagNode>+Node, System.Boolean>>>

leculver commented 4 years ago

Looks great, ship it!

erikness commented 3 years ago

Just ran into this issue; I bet this fix will solve it for me, thank you @ryanmolden for working on it!

Let me describe what I'm seeing in case anyone in the future runs into the problem in the same way.


In my case, I was iterating runtime.Heap.EnumerateObjects(), and if objects had the type name System.Collections.Concurrent.ConcurrentDictionary<MyTypeA, MyTypeB>, I'd try to print all the "MyTypeA" key objects. To do this, I'd dive into the internals of ConcurrentDictionary like so:

// type of _tables is "Tables", a nested class
ClrObject tables = clrObject.ReadObjectField("_tables");
// type of _buckets is "Node?[]", and Node is a nested class
ClrObject buckets = tables.ReadObjectField("_buckets");
ClrArray bucketsAsArray = buckets.AsArray();

for (int i = 0; i < bucketsAsArray.Length; i++)
{
    ClrObject node = bucketsAsArray.GetObjectValue(i);
    if (!node.IsNull)
    {
        // print MyTypeA, which for me is at node.ReadValueTypeField("_key")
    }
}

But I was getting an ArgumentException that _tables was not a field of ConcurrentDictionary. Huh?? So I printed out the ClrObject.Fields, and it told me this:

_value, _next, _hashcode, _key

or occasionally this:

_buckets, _locks, _countPerLock, _fastModBucketsMultiplier

These are actually the fields of the nested classes Node and Tables, respectively. (It would also tell me the correct fields sometimes).

This was run on a program with a large heap and lots of ConcurrentDictionaries. I tried to repro this behavior in a small program and here's what I found:

Temporary fix

Constructor names are not broken, so I can differentiate between ConcurrentDictionary and Node by checking the methods. For example, the here are the two constructor signatures:

// ConcurrentDictionary
"System.Collections.Concurrent.ConcurrentDictionary`2[[ObjectSizePlayground.ThreeIntStruct, ObjectSizePlayground],[System.__Canon, System.Private.CoreLib]]..ctor()"
// Node
"System.Collections.Concurrent.ConcurrentDictionary`2+Tables[[MyAssembly.MyTypeA, MyAssembly],[System.__Canon, System.Private.CoreLib]]..ctor(Node<MyAssembly.MyTypeA,System.__Canon>[], System.Object[], Int32[])"

Or you can just put a try/catch around ReadObjectField.

leculver commented 3 years ago

@erikness Does the problem still reproduce on the newest version of ClrMD with this issue fixed?

erikness commented 3 years ago

Oops, didn't realize you had already pushed out a fix. Just tested it, looks like it works!