Add A/B comparison support

joaospinto commented 10 months ago

Hi! It would be nice to have an A/B mode for speedscope.

Let me clarify what I mean by this. Suppose I have a program, and gather its initial profiling data. This would be the "A" output. Then I go and make a bunch of changes, and gather some new profiling data. This would be the "B" output. Then I'd want to compare what changed between A and B.

This is a very common use case, and I wonder what your thoughts are about adding some native functionality for this.

Thanks!

zacharyfmarion commented 8 months ago

+1 I have also thought about this - maybe to start with a "compare" tab that allows you to upload two profiles - then shows a table where you can sort by total / self time +/-. A fancier version would be to render a profile diff like this to show which functions got cheaper or more expensive. @jlfwong how opposed are you to large UI additions to the website? Not sure what your philosophy is around keeping the app minimal

jlfwong commented 8 months ago

I'm not opposed to large UI additions, with a few caveats...

I'd definitely want to hash out details before going to a PR. For something of this scope, I'd want to start with both UI mocks and an eng spec. This is something on the order of a full-blown project, and I'd want a place to discuss product and eng tradeoffs before this turns into a big PR (it would almost definitely be better as a series of small PRs)
As you might've noticed, the time I put into speedscope fluctuates a lot, and there are periods where I don't spend too much time focused on it. For more isolated changes to things that are purely import-based, it's easier for me to review quickly because it's easy to think through the scope of implications for a change. For something with wide-reaching implications like this, I'd expect you'd need to bear with me for much longer.

Of course even if I drag my feet, if you have a version that works, you always have the option of self-hosting a version somewhere for people to play with :)

zacharyfmarion commented 8 months ago

@joaospinto @jlfwong I took some time to look into this in the past week - have a working self-hosted version here if either of you want to play around with it.

It would definitely would need some cleanup and iteration like you describe above. I'm going to collect some feedback from some engineers I work with and then maybe can start a spec + work out the UI details with you. Here are some screenshots of how it currently works:

New Compare tab
Table view showing frame differences
Stacked call stack flamegraphs to compare function calls

Initial thoughts:

It's hard to be totally accurate matching before + after frames if line + col numbers are in the frame key, since if you change a function and want to compare the before and after, the keys will not match. We could potentially introduce a "compareKey" in addition to the normal key that is basically the following:

function getCompareKey(frame: Frame) {
  if (['anonymous', '(unnamed)'].includes(frame.name)) {
    return `${frame.name}:${frame.file}:${frame.line}:${frame.col}`
  }

  // Assume that function names are unique per file if not anonymous
  return `${frame.name}:${frame.file}`
}

Would require refactoring some global state because of some assumptions that no longer hold up. In my current code I have just introduced a new atom called compareProfileGroupAtom but there might be a more elegant approach.
Definitely lots of error handling / edge cases to think through. Off the top of my head: erroring if there are no matching frames between profiles, erroring if there is a significant difference in total time between profiles (maybe allow selecting a slice of a profile to dial in on a similar execution path), ensuring that we are properly aggregating frames so we don't say that a function got faster but there is another frame for the same function where it got slower, etc.
Supporting multi-profile viewing might be a natural extension of this if the code is already being refactored. Probably not something I would want to scope in though if I implemented this.

@jlfwong maybe you could play around with the UI and we could start discussing the UX? The code itself is probably not worth looking at at the moment. Here are some example profiles that I have been using to test.

zacharyfmarion commented 8 months ago

An alternative would be to have some kind of global toggle to go between "single viewing mode" and "compare mode", and re-using the existing tabs. I did some explorations into red/green flamegraph rendering to show improvements / regressions, but the issue is that if you are looking at a non-aggregated timeseries view of a flamegraph, you really want to know if a specific function call got slower or faster, not if the total time spent in that function got slower or faster. That requires an algorithm for diffing two n-ary trees, which is hard and made my head hurt. It should theoretically be possible though. Here is an example flamegraph diff just matching on frames (showing self time change):

There is also a lot more UI complexity introduced by these kinds of graphs, for example:

Should we allow a user to specify a minimum threshold in ms for what is considered a change that matters?
Toggle between showing self change vs total change? Or just always show self change?

There is definitely a world in which frame-based flamegraph diffing is good enough (it does give useful signal) even though in certain cases it will mark something as green which actually is lower or red which actually is faster. Kind of depends on how fast and loose we want to fly with correctness / interpretability.

@jlfwong maybe we can discuss a bit here first just to get your initial thoughts and then I can flesh out an approach? Definitely going to have less time once I head back to work but would love to work with you on it if you think it's valuable. Also no rush responding, happy new year 🥳

zacharyfmarion commented 8 months ago

Here is a draft of the motivation to clarify exactly why I think this would be valuable:

Often when we are doing performance optimization work, we are iteratively improving a certain part of a performance trace. This process looks like:

Take one trace, see what is slow
Attempt to fix
Take another trace
Open two speedscope windows and switch between tabs to see whether the certain area of the profile we attempted to improve actually got faster

Generally I use the flamechart to get a view of what is slow, and then the sandwich view to see exactly what the total or self time difference was for a given function. This process is suboptimal as you only see changes if they are visually obvious in the flamegraph, or if you know what you’re looking for.

This means that you could very well introduce an optimization that improves the thing you are targeting, but slows something else down. Without manually checking every part of the profile to see what got slower or faster, it’s hard to make sure that you aren’t introducing a regression. Additionally, it’s just not as convenient as having a view that tells you exactly what got slower or faster between profiles.

Ideally we could make this process simpler by integrating a view into the speedscope UI that natively allows for comparing two profiles. This would reduce the burden of switching between tabs and also surface exactly what got slower or faster, instead of just what the user visually notices.

I think there is a counterargument that says "it's not that hard just to look" and that this is kind of complicated and there are a lot of ways that incorrect / misleading information could be surfaced, so it's not worth it. From my perspective the compare tab showing the table view is a good middle ground, but open to discussion!

cb-bradbeebe commented 5 months ago

@jlfwong What do you think about the proposal above? I've been playing around with its compare functionality and it has come in very handy.

iogbole commented 2 weeks ago

@jlfwong - Would you consider accepting this proposal?

jlfwong / speedscope

Add A/B comparison support #445

Initial thoughts: