microsoft / SizeBench

SizeBench is a binary size investigation tool for Windows
MIT License
103 stars 14 forks source link

Improve performance of many operations, especially for large binaries #43

Closed Austin-Lamb closed 5 days ago

Austin-Lamb commented 1 week ago

Why is this change being made?

SizeBench has had perennial problems with performance for various customers, especially those with large binaries. Recently a binary was found with a pathologically bad case that took 17 hours to process which is obviously absurd - so I'm taking the time to invest in some significant core performance-focused changes to improve performance for most use cases.  

Briefly summarize what changed

There's a variety of changes in this PR, found as I kept profiling and trying things until eventually performance trade-offs seemed right:

And then the biggest change by far which is that we now make an intentional tradeoff to spend more time in session "open" to do a "pre-process" of symbol information and collect up important things that we'll want many, many times later - in particular, for each RVA we record each SymIndexID that is present (there can be multiple when folding happens). This means we do one giant walk on startup instead of many tiny walks for each RVA range as we do various operations. This causes opening to take longer and take more memory upfront, but makes so many subsequent operations faster that this is a net win for almost all use cases.

How was the change tested?

For one large binary inside Microsoft, BinaryBytes went from taking 24.5 minutes to 4.75 minutes. For another large binary, SizeBench GUI here's some timings on my machine:

Before After
Open session 16s 48s
Find all .text symbols 37.4s 1.6s

So it does take significantly longer to open the session for giant binaries that do heavy inlining and COMDAT folding and such, but this is "paid back" to the user the first time they look at something interesting. Then once you do the second thing you're really winning on total time.

In theory this could be worse if the user opens the GUI (now takes longer) and then doesn't look at any symbols and only goes to, say, annotations or type layouts...but that just doesn't seem like a super common use case and the only downside is that this is a little slower, it's still fully functional.

PR Checklist