microsoft / DbgShell

A PowerShell front-end for the Windows debugger engine.
MIT License
675 stars 89 forks source link

DbgHelp is slow and other symbol "fun" #67

Open Zhentar opened 5 years ago

Zhentar commented 5 years ago

The purpose of this issue is to share some learnings (and some salt about DbgHelp) that I have gained in the past few days furthering my descent into madness my understanding of symbols; whether any of this should be considered an issue that prompts changes to DbgShell or an endorsement of any particular approach I leave to the readers' judgement. SymbolSearchPerf.cs.txt

  1. SymSearch results are a superset of SymEnumSymbols - there is no symbol that SymEnumSymbols will return that SymSearch can not also return.

"But wait!" you say, "I have done searches that get results from SymEnumSymbols but not SymSearch!" And indeed, you are correct. So I shall share the sad tale of a simple, unassuming symbol, known to its friends as mshtml!g_hIsolatedHeap.

> Get-DbgSymbol "mshtml!g_hIsolatedHeap" -UseSymEnum
      0x642b1ba0 g_hIsolatedHeap

As expected. And what about SymSearch?

> Get-DbgSymbol "mshtml!g_hIsolatedHeap" -UseSymSearch

Nothing. Have I lied to you? No! I have not! It is DbgHelp that lies! This symbol's proper name is not mshtml!g_hIsolatedHeap at all, but in truth is mshtml!_g_hIsolatedHeap!

> Get-DbgSymbol "mshtml!_g_hIsolatedHeap" -UseSymSearch
      0x642b1ba0 g_hIsolatedHeap

SymEnumSymbols succceeds where SymSearch does not because DbgHelp secretly inserts search globs to cover for the lies it tells! But at what cost, you ask? "No, I didn-" Yes, you did. But unfortunately, there are so many different ways that DbgHelp is heckin' slow, you can't unambiguously pin it to one single thing† - but one can at least quantify how slow the various ways of calling into it to ask about our friend mshtml!_g_hIsolatedHeap are!

Method Mean Error Ratio
FindSymbol_Enum 1,751,811.693 us 77,915.3682 us 1.000
FindSymbol_Enum_ExplicitStar 1,225,950.550 us 22,614.1927 us 0.691
FindSymbol_Enum_NoType 1,167,999.680 us 17,032.5053 us 0.668
FindSymbol_Enum_Mangled_SymOpts 705,314.202 us 34,711.8878 us 0.421
FindSymbol_Enum_PublicsOnly_Mangled_SymOpts 670,834.033 us 16,164.5092 us 0.384
FindSymbol_Search_ForcedStarGlob 3,409,318.860 us 37,855.2651 us 1.949
FindSymbol_Search_SecretUncodumentedWildcard 3,017,108.117 us 156,676.2463 us 1.736
FindSymbol_Search_Mangled_SymOpts 368,730.864 us 6,861.2060 us 0.210

(spoiler alert: Benchmarkdotnet picked microseconds for this table because some of the rows I cut out actually need it)

There are few things you can tell from that table (should the names happen to make sense to you, or should you reference against the attached code). First is that with the default settings DbgHelp does try to match against your original search terms before searching again with the injected star glob. Next you might notice that SymSearch manages to take twice as long (unless you set symopts to search against still-"decorated" streams, a trival concession for our symbol with no decoration to be had). I'm not really sure how it does that. One might also notice that there are undocumented wildcards (that one might want to, you know, escape). In this case, I used #, which seems to mean "match the preceding character 0 or 1 times"‡, but there's also some handling for a [a-z] style syntax.

Now, if one were to examine DbgHelp more closely††, one might notice that the public apis seem to pretty much just be messy wrappers aound DIA functions. And there is an undocumented SymGetDiaSession, what might one get from that? Crashes, unless one also notices the SymFreeDiaString and surmises that DbgHelp's DIA is special and doesn't use SysAllocString to allocate "BSTR" values.

Method Mean Error Ratio
findChildren_Mangled_DiaComInterop 2,816.546 us 45.3609 us 0.002
findChildren_Mangled_DiaCppCli 111.870 us 0.5413 us 0.000
findChildren_Unmangled_DiaCppCli 338,699.820 us 6,055.1215 us 0.194
findChildren_Mangled_DiaCppCli_BuiltinSearchGlob 146,313.182 us 7,739.2609 us 0.084

Wow, using .NET COM interop with an exact search string gets us an answer 100 times faster (as long as we don't look at any string properties...). C++/CLI our way around the string issue and we get a whopping 3,000 times faster. Even undecoration & search globs can't get us as slow as the DbgHelp APIs go. But wait! There's more... most of those 112 microsecond were just fetching the DiaSession!

Method Mean Error Ratio
findChildren_Mangled_DiaCppCli_CachedSession 4.629 us 0.0850 us 0.000

But why stop there? If you profile it, it's painfully obvious that DbgHelp's DIA is hamstrung by some inefficient UTF8 -> UTF16 conversion logic... and the disassembly tarot cards will tell you that the version of DIA in msdia140 somewhat improves on that conversion, in addition to a statically linked towlower... so how would that fare?

Method Mean Error Ratio
findChildren_Mangled_msdia140CppCli 3.573 us 0.1180 us 0.000
findChildren_Unmangled_msdia140CppCli 361,605.138 us 7,670.6590 us 0.206
findChildren_Mangled_msdia140CppCli_BuiltinSearchGlob 140,708.329 us 5,434.0552 us 0.080

Well, shoot. Those numbers are rather remarkably close to DbgHelp's... I profiled them to make sure I wasn't accidentally double testing. I'm starting to suspect there was a reason the DbgHelp DIA used LocalAlloc instead of SysAllocString... msdia140 will use LocalAlloc if you use a secret alternate GUID to get your IDiaDataSource but I think I should post this and go to sleep before I get lured into testing that

( SymbolSearchPerf.cs.txt contains the benchmark code, which should compile & run against the head of my personal DbgShell repository)

† aside from the dynamically linked towlower call in DbgHelp!nextToken

‡ except in mshtml!_#_hIsolatedHeap where it means "get stuck in an infinite loop"

†† Not by diassembling it, of course, that would be against the terms of use!

jazzdelightsme commented 5 years ago

Thank you for doing this analysis and sharing it. I have long wondered about the SymEnumSymbols vs SymSearch thing. (You'd think I could've just asked someone who works here, and I did try that, but got no reply.)

Zhentar commented 5 years ago

Apparently SymSearch is only super slow if you're using the Recurse or AllItems search options in combination with undecorating symbols - because it doesn't apply the search mask until after populating the symbol info, causing every single symbol's name to go through an extra unnecessary (and suprisingly slow) utf8 -> utf16 -> utf8 round trip before undecorating it.

I also can't actually tell how AllItems is different from GlobalsOnly when there' isn't also Recurse (aside from being much slower).

And an actual DbgShell issue: Recurse with a "*" symbol name search errors out because some of the symbols returned don't have names