spfrommer / torchexplorer

Interactively inspect module inputs, outputs, parameters, and gradients.
https://spfrommer.github.io/torchexplorer/
Apache License 2.0
235 stars 14 forks source link

Feature request: color-coded graphs for performance visualization #51

Open legel opened 8 months ago

legel commented 8 months ago

Gathering data from https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html

...it would be fantastic if there was a library with a one-line API comparable to what Tensorboard previously offered with TensorFlow, for color-coded graph visualization of performance metrics per computational graph element -- namely, runtime, but also of interest, would be memory metrics... e.g. see https://branyang.gitbooks.io/tfdocs/content/get_started/graph_viz.html

The problem with Tensorboard PyTorch support is apparently it's a mess right now... Please ping me if this is of interest to develop, I think it would greatly help ML developers to be able to both visualize graphs and visualize performance bottlenecks of the graphs...

legel commented 8 months ago

PS probably it's obvious, by color-coding, I mean, e.g.

Gradient from Blue to Red, where the darkest blue == max seconds of processing time, darkest red == least seconds, based on a simple min/max normalization from all computed graph elements, and a best-estimate allocation of the Profile runtimes per element, shown on your graph viz...

spfrommer commented 8 months ago

This would be a good feature and should be possible. However, displaying profiler statistics as a color coding would clash with the highlight colors to show which node is being visualized in the right-hand panel. I'm instead envisioning something like the following mockup: draftprofile The bars on the left hand side are runtime statististics, right hand side would be memory. For runtime, black is forwards pass and gray is backwards pass. For memory, black is the memory consumed by activations and gray is memory consumed by parameters (possibly including gradients for the backwards pass?). Heights are normalized as you described.

Mouseover would provide a tooltip explaining each bar and giving the concrete value (e.g., "Fwd pass runtime: 3e-4").

In terms of implementation, I don't know much about profiling but was thinking the following.

Runtime

Memory

Shared details

Question Does the above make sense to you / match your expectations of what you'd like to see profiled? Any other suggestions are also welcome.

legel commented 8 months ago

Thanks @spfrommer for the super fast, detailed, and visionary reply!

I'm very happy to affirm a lot of what you describe where it makes sense to me, and also to critically propose changes where I think it would result in a more useful UX.

"...displaying profiler statistics as a color coding would clash with the highlight colors to show which node is being visualized in the right-hand panel."

I see your concern. Consider the following...

"...The bars on the left hand side are runtime statististics, right hand side would be memory. For runtime, black is forwards pass and gray is backwards pass. For memory, black is the memory consumed by activations and gray is memory consumed by parameters (possibly including gradients for the backwards pass?). Heights are normalized as you described."

It's a neat graphical design that you propose. The reason why I push back and suggest doing extra gymnastics for color is because in my research and decade of work in data visualization I've found that colors are by far the most valuable tool for dealing with complexity. I think it would be difficult to very quickly determine performance bottlenecks at a high-level across a very large computational graph for the design you propose, because the many small black/grey bars would be difficult to see when zoomed out. I also figured it wouldn't be too hard to go with the "gold standard" of a fully coded-coded view of the graphical network...

Design proposal visualization

Screen Shot 2024-01-07 at 5 50 25 PM

Above, we use the full color spectrum for visualization, following science of the Turbo colormap designed by Google.

Above is based on Python code that I just made for demo purposes here: neural_network_turbo_color_coding.py.zip

"In terms of implementation..."

As I suggested, I would recommend just going after one of the profiling challenges first and trying to deploy on just that. Between runtime vs. memory profiling, there's some pretty nice work in the right direction for memory profiling recently published by Torch maintainers, detailed here. So, I think when it comes time to implement memory profiling, their approach should help.

For starters, the simplest and most essential task is to get runtimes for every single computational element (I figure trying to parse and post-process outputs from the Torch Profiler might get there, but not sure how well it matches up with your graphical modeling).

_"...Profiling will only happen every logfreq'th iteration. Profiling results will be updated in some kind of a running average."

While logging every n-th iteration works, just doing the profile for a single forward and backward pass would already be super useful; e.g. could save that data to an export file, and allow the user to dynamically explore the details of that.

There's a lot more I could get into here, but hopefully above is helpful and sufficiently inspiring! I wish I had it today for an actual profiling project I need move forward on now...

spfrommer commented 8 months ago

Really appreciate the detailed feedback! I agree the colors would be better for usability, and having different analysis modes is the better UI. My suggestion was mostly motivated by practical considerations with Vega. Vega has a really focused grammar: it is not designed to support fancy GUIs, and I'm really pushing the limitations of what it can do. Nevertheless, it's still probably better to just do it properly than to keep hacking additional features into one interface.

To select between the different analysis modes, I'd need an equivalent of a drop down box or a radio button. Vega supports binding inputs to html elements, but these are outside of the visualization pane and by default are ugly and uncustomizable within the json spec. Customizing them to be appropriately positioned and formatted involves custom CSS external to the spec, which would be possible on the standalone backend but not the wandb interface. I'd probably have to hack it together with a Vega legend as a radio button.

Having the panels on the right hand side be custom for each analysis mode is probably the right design choice, but would involve a significant rearchitecting of how I handle the panels in Vega.

The pytorch memory profiling tools you've linked seem to largely involve examining memory allocation over time to detect memory leaks. I think the natural thing to display on the right-hand panels for the "memory analysis" mode would be the per-module line chart for the memory usage over the forwards pass and backwards pass, with time on the x axis (how to even get this information is another complication). This would help narrow down if a specific module is leaking memory in the forwards pass (e.g., by saving intermediate tensors to the module attributes). But my feeling is that most memory leaks happen outside of the module forward invocations (i.e., with metrics logged on the module output without calling detach() first).

In summary, I really like your suggestions and I'll probably refer to our discussions when I end up implementing this. But it involves a big architectural rework for what's essentially a side project to my research--it's probably not something that I'll get around to in the near future.