johnmarktaylor91 / torchlens

Package for extracting and mapping the results of every single tensor operation in a PyTorch model in one line of code.
GNU General Public License v3.0
454 stars 16 forks source link

Feature request: color-coded graphs for performance visualization #20

Closed legel closed 1 month ago

legel commented 8 months ago

Gathering data from https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html

...it would be fantastic if there was a library with a one-line API comparable to what Tensorboard previously offered with TensorFlow, for color-coded graph visualization of performance metrics per computational graph element -- namely, runtime, but also of interest, would be memory metrics... e.g. see https://branyang.gitbooks.io/tfdocs/content/get_started/graph_viz.html

The problem with Tensorboard PyTorch support is apparently it's a mess right now... Please ping me if this is of interest to develop, I think it would greatly help ML developers to be able to both visualize graphs and visualize performance bottlenecks of the graphs...

legel commented 8 months ago

PS probably it's obvious, by color-coding, I mean, e.g.

Gradient from Blue to Red, where the darkest blue == max seconds of processing time, darkest red == least seconds, based on a simple min/max normalization from all computed graph elements, and a best-estimate allocation of the Profile runtimes per element, shown on your graph viz...

johnmarktaylor91 commented 8 months ago

This is a great suggestion, thanks so much! TorchLens already logs all that info, so it’s just a matter of allowing the visuals to reflect it. Currently the color scheme is one that I devised to try to make the salient aspects of the network pop out, with minimal tinkering from the user. But, it could be a helpful “power feature” to allow the user to override these defaults as needed.

The tricky thing would be balancing simplicity and flexibility. Currently, my philosophy is to be conservative about hard-coding new use-cases, while giving users all the data and flexibility they need to do anything they want.

What about something like the following: provide a set of optional visualization arguments for variables like the color, shape, and size of the box for each layer. For each such argument, allow the user to pass in a function for determining the value of that variable. For instance,

def color_func(model_history, layer):
         rgb = some_func(layer.time_elapsed)
         return rgb

show_model_graph(model, x, color_func=color_func)

So, the user would pass in a function that takes as input the data structure both for the overall model, and for each individual layer, which would compute the visualization variables based on the metadata provided by TorchLens.

I’ll have to think about this, but does this seem sufficient for your use case?

legel commented 8 months ago

That makes sense!

I would instantly use that.

When I had the vision of color-coding a graphical network representing Torch runtime / memory stats, etc., I also posted it to a related open source project called TorchExplorer. There I also published a visual demo based on a Turbo colormap, including Python code, which simulates a dummy neural network I put together for visual purposes. I'd love to port that same exact code to real graphical networks of very complicated neural networks I'm trying to profile now (LLMs, vision transformers, ...)

Let me know if you push any code for custom color-coding based on e.g runtime, I'd love to follow-up and share some examples with fine-tuned colormaps that I think could look really awesome (like below) and be very useful.

neural_network_color_coded

legel commented 8 months ago

Just showing the high-level visuals and big ass tables you get from TensorBoard for runtime and memory visualization of an exported PyTorch Profiler of CUDA... almost useless for gathering broader network understanding of bottlenecks in runtime and memory.

Screen Shot 2024-01-09 at 12 16 40 AM Screen Shot 2024-01-09 at 12 15 41 AM Screen Shot 2024-01-09 at 12 14 49 AM
johnmarktaylor91 commented 7 months ago

I'll put it on top of my list and keep you posted once it's implemented! Can't wait to see what you do with it.

legel commented 7 months ago

Sounds great! Yeah I’ll be happy to share code and demo(s) with the core infra you setup for coloring the compute graph.

johnmarktaylor91 commented 7 months ago

Okay, here's the interface after some initial tinkering... does this seem natural and intuitive to your eyes? Basically, you provide a dict specifying the visualization arguments you want to override (these correspond to the various graphviz functions), where each value is either a literal value (for things that don't depend on the particular data in the model or layer) or a function (for things that depend on metadata about the model or layer). If it looks good I'll polish and push it.

model = torchvision.models.alexnet()
x = torch.randn(1, 3, 224, 224)

def colorize_runtime(model_history, layer):
    cmap = matplotlib.cm.get_cmap('jet')
    max_runtime = max([layer.func_time_elapsed for layer in model_history])
    min_runtime = min([layer.func_time_elapsed for layer in model_history])
    scale_pos = (layer.func_time_elapsed - min_runtime) / (max_runtime - min_runtime)
    hex_color = matplotlib.colors.to_hex(cmap(scale_pos))
    return hex_color

def label_layer(model_history, layer):
    return layer.layer_label

# Color each node based on runtime, remove all text except layer label.
# Logic: if the provided value is a function, call it on the model history and layer. Otherwise, just use the value.
vis_node_overrides = {'label': label_layer,
                      'shape': 'ellipse',
                      'fillcolor': colorize_runtime}

# Remove graph label
vis_graph_overrides = {'label': 'Runtime-colorized visualization'}

# Hide the nested module boxes by turning them white and removing the text.
vis_module_overrides = {'pencolor': 'white',
                        'label': ''}

tl.show_model_graph(model, x, vis_node_overrides=vis_node_overrides, vis_graph_overrides=vis_graph_overrides,
                    vis_module_overrides=vis_module_overrides)

image

johnmarktaylor91 commented 7 months ago

Here's another example where the nodes are colored based on their runtime, and sized based on their filesize, by passing in the following function:

def height_by_fsize(model_history, layer):
    min_height = 1
    min_fsize = min([layer.tensor_fsize for layer in model_history])
    max_fsize = max([layer.tensor_fsize for layer in model_history])
    height = min_height + (layer.tensor_fsize - min_fsize) / (max_fsize - min_fsize)
    return height

image

legel commented 7 months ago

These both look great!

In terms of the API you propose, I think it's simple and practical enough, so long as the intent of being able to do custom color-coding (and size-coding -- awesome work) is clear.

You might consider wrapping up these examples with your final API in a demo Jupyter notebook, with a notebooks section at the top-level, e.g. as Meta DinoV2 researchers did here: https://github.com/facebookresearch/dinov2/tree/main/notebooks

I think turbo cmap in Matplotlib should be superior but very similar to jet, and you can just change text to that.

The size-coding is super nice and innovative. Most intuitively, it seems like size can do a great job of showing off differences in memory requirements. I think that's what you've started with in terms of layer.tensor_fsize but am not familiar enough with that API to know if tensor_fsize corresponds to actual GPU VRAM or RAM consumption or what. (Ideally, it could be useful to specify "CUDA" or "CPU" memory consumption per layer, if that's indeed what you're going for.)

In general, I think a really good legend will be key to bringing together the colors and sizes into a useful data visualization exploration. That's because it should be possible to quickly reference the colors/size vs. a ground-truth legend. So it will be great to have a big legend that shows both a range of colors across full colormap (as in my example, above) as well as a similar legend custom made for showing the min/max sizes with numerical values alongside.

This is practically already very useful, and I'm excited to pick it up and share some demos with some of the complex neural net modules I'm currently seeking to optimize performance of...

legel commented 7 months ago

PS I just saw that you're a postgrad researcher at Columbia. I studied ML there for comp sci grad school :)

johnmarktaylor91 commented 7 months ago

Thanks a ton for the feedback—yup, I’ll def add this to the tutorials once I’ve finalized things (there’s a Colab tutorial notebook linked on the main readme for torchlens). To be clear, I’m probably not going to hard-code these literal visualization options (e.g. this specific way of coloring and sizing the nodes), but rather just provide these override options so the user can do so themselves as they like, if only because I can’t anticipate all the use cases and graphic design is not my main specialty (the upside to this is that it should allow unlimited flexibility for visualizing things based on the model metadata). But, perhaps it would be useful to provide some examples along with accompanying code for people to riff off of.

Regarding the legend, that might be tricky because torchlens runs on graphviz, and graphviz doesn’t have a built-in way of doing legends. The only way would be to perhaps make the legend with some other tool like matplotlib, and then paste the graph image and the legend image together. I’d have to think about whether this would make sense to add, since it might be a specialized use case and I want to avoid feature creep. But if there’s sufficient user demand I can brainstorm ways to streamline this.

The tensor_fsize field is computed using the sys.getsizeof function from the Python standard library. Currently I don’t break it down by CPU/GPU but that would be easy to do.

And yes I’m at Columbia, in the neuroscience department :) I’ll keep you posted once I’ve pushed the new code.

johnmarktaylor91 commented 1 month ago

Apologies for the slow rollout on this, but this functionality has now been added to the main branch in the latest release.

legel commented 1 month ago

Cool! I'll make sure to check this out when I'm hitting my next Torch performance bottleneck. :) Cheers!

johnmarktaylor91 commented 1 month ago

I welcome any feedback you've got! Closing for now, cheers