Clusters convolutions with different kernels

stared commented 5 years ago

I use the newest version bd0d36a475279353a3a093a3954752f04288283a. I have a problem that it combines two different convolutions (in this case conv 3x3 > relu and conv 1x1 > relu) in one, incorrect block (conv 3x3 > relu x2).

import torch
from torch import nn
import hiddenlayer as hl

model = nn.Sequential(
          nn.Conv2d(8, 8, 3, padding=1),
          nn.ReLU(),
          nn.Conv2d(8, 8, 1),
          nn.ReLU(),
          nn.MaxPool2d(2, 2))

hl.build_graph(model, torch.zeros([1, 8, 32, 32]))

screenshot 2019-01-19 07 54 29

When there are different activations, the problem disappears:

model = nn.Sequential(
          nn.Conv2d(8, 8, 3, padding=1),
          nn.ReLU(),
          nn.Conv2d(8, 8, 1),
          nn.MaxPool2d(2, 2))

screenshot 2019-01-19 07 55 15

Side notes

It clusters ReLU with operations (which I like a lot, otherwise there is too much of visual noise), but not other activation functions (sigmoid, tanh, etc); is there some rationale for that? (My preference would be to cluster all reasonable activation functions).
In general, I am very enthusiastic about neural network visualizations, see my overview: Simple diagrams of convoluted neural networks; I did mine, but for a limited case od purely sequential networks in Keras: https://github.com/stared/keras-sequential-ascii,

waleedka commented 5 years ago

Folding conv3x3 and conv1x1

That looks like a bug in the FoldDuplicates transform. Thanks for the report. I’ll work on it, but in the meantime if you want a quick fix and you don’t need the folding of duplicate layers then you can remove the FoldDuplicates transform from the default transforms in in transforms.py:

Folding non-ReLU activation functions

Good catch! This is an oversight. I guess I didn’t have non-ReLU activations in the test networks. I’ll look into that, but in the meantime you can quickly fix it by duplicating some of the rules in the SIMPLICITY_TRANSFORMS list (in transforms.py) to include your activations.

Thanks for sharing your blog post and visualization library. Good overview. I wish I had found it when I was researching the space. One of the design principles of HiddenLayer is the separation of graph clean ups from graph visualization. Currently, only one visualization option is available, which is generating DOT graphs of GraphViz, and it’s fully contained in the build_dot() function. I noticed your interest in JS visualizations, so if you’re interested in adding a JS-based dynamic graph I would be happy to help with pointers and advice.

stared commented 5 years ago

Thanks, @waleedka !

I see it uses Graphviz (fortunately, now it is easy to install it it with conda; previously it was painful), open ot developing a more interactive version. (And ideally, a version capable of displaying this hierarchical structure, as "Inception block.")

I would be happy to talk more, since there is an overlap in interests and approaches. Vide my https://github.com/stared/livelossplot package (did you take any inspiration from it?) and to a lesser extend, variants of confusion matrix at https://github.com/stared/keras-interactively-piterpy2018/blob/master/talk.ipynb.

waleedka commented 5 years ago

I would be happy to talk more, since there is an overlap in interests and approaches. Vide my https://github.com/stared/livelossplot package (did you take any inspiration from it?)

I have come across livelossplot. It's a great tool. I didn't know it was yours. Great work. But, no, I didn't take inspiration from it. I had used dynamic plots to display loss functions for a long time. For example, I show an example of it in this presentation, which was about a year before livelossplot. https://youtu.be/QHra6Xf6Mew?t=3450

stared commented 5 years ago

@waleedka Thank you! So, it is awesome to learn that we can to the same ideas... and make them work.

Do you think it would be a good idea to join forces? (One for tracking the training process, another for graph viz.)

waleedka commented 5 years ago

@stared Yes, happy to talk about it. I'll DM you.

waleedka / hiddenlayer

Clusters convolutions with different kernels #23

Side notes