feat: tooling to identify poor patch performance

btrautmann commented 3 weeks ago

Description

It would be great if there was tooling that was focused on highlighting performance regressions introduced in patches. Flutter developers today have at their disposal a few profiling tools, but it would be great if shorebird users could have additional tooling to "visualize" how a patch would impact their runtime performance, particularly in comparison to the unpatched (original release) version of the code.

Requirements

[ ] Likely a GUI or at the very least an interactive console application
[ ] Is not busy with details that do not involve shorebird, i.e I don't care about seeing details on code that was not impacted by a patch
[ ] Clearly compares runtime performance of code pre and post-patch (i.e average runtime of X code block was ~30ms in original binary but is ~4500ms in patched version)

I'm not sure how much of this is possible or if it's achievable with existing tools, but I wanted to highlight one of the things we're thinking about in identifying whether or not a patch should go out the door or be released through the normal process.

eseidel commented 3 weeks ago

Thanks! We're actually working on a couple things here.

Erick is in the middle of adding a flag to our patch tool to record debugging information during patch creation: https://github.com/shorebirdtech/shorebird/pull/1934.

Felix is also in the middle of solving our last remaining known sources of low linking (e.g. when patches run mostly not on the cpu). https://github.com/shorebirdtech/shorebird/issues/1892

Once Felix is done, hopefully we won't need much here.

That said, it's also possible from the debug information that Erick is collecting for us to build tooling (as you suggest) for developers to tell what lines of their program will or won't run on the CPU after patching. We should essentially be able to provide coverage information which says which lines will run where since the Dart snapshot knows that information.

My hope is that worrying about link percentages is going to completely go away in a couple weeks once Felix is done with this round of fixes. One of the design goals of Shorebird is that it be zero-cost to adopt and use, including that it should have zero affect on your program. Obviously since the App store requires us to use an interpreter (about 100x slower than the cpu) for updated code, there is cost to changing code, but most of the time we don't think Apps will notice that cost at all, since most code app developers write (or want to update) just isn't very performance sensitive. Most of the performance sensitive Dart code in Flutter apps is down in the Flutter framework itself (which app developers rarely change, and are very unlikely to ever patch).

btrautmann commented 3 weeks ago

since most code app developers write (or want to update) just isn't very performance sensitive. Most of the performance sensitive Dart code in Flutter apps is down in the Flutter framework itself (which app developers rarely change, and are very unlikely to ever patch).

There's an important callout here that, as a not-so-bright-individual, I didn't consider initially. I was under the assumption that the "scope" of changed code was considered/calculated at a higher level, like the actual dart code itself. For instance I assumed (foolishly, maybe) that if I changed a function in my dart code, that entire function would then run on the interpreter. This would be super bad for cases where a function iterates through thousands of entities (like my app does often). @felangel helped me understand that the diff performed to determine what runs on the interpreter is done at a much lower level (assembly?), so the scope shouldn't be thought of at the dart level. We discussed an example where you iterate through 20k transactions and multiply their amounts. If I changed the multiplication operation to an addition, it's that specific instruction that will be run on the interpreter, not the entire loop that iterates through the transactions. @felangel or @eseidel feel free to (as your time allows, no pressure) correct me here.

This new understanding makes me much less worried about performance (assuming a higher link percentage) and may make the tooling less critical (though still nice, if it's not a huge lift on your end).

We should essentially be able to provide coverage information which says which lines will run where since the Dart snapshot knows that information.

I think this would be totally sufficient, I imagine doing something like shorebird patch <release> --inspect or similar and a nice table is printed with line pointers to code identified as being run on the interpreter.

shorebirdtech / shorebird

feat: tooling to identify poor patch performance #1937