jerryqhyu / distill_bayes_net

0 stars 0 forks source link

Hero Diagram #2

Open ludwigschubert opened 6 years ago

ludwigschubert commented 6 years ago

I really like your hero diagram concept! Here are some ideas for making it easier to understand what's going on:

  1. NN weights

    • Currently training is hardly visible. Check out the tensorflow playground for a suggestion how to animate training using stroke-dashoffset.
    • Would the demonstration work with a smaller neural net? The many weights feel noisy; what would happen if you used as few neurons and layers as you can?
    • Experiment with double-encoding the weights, for example using both stroke-width and opacity to help differentiate between weights. You can also experiment with using an additional color scale
    • Help me understand the relationship between the NN diagram and the ten Gaussian samples: Can we think of them as ten different networks? If so, show them as small multiples. Is there a distribution over the weights?
    • Is there a way to speed up training? Maybe by using less noise? Currently the demo takes a little long to converge and thus feels anticlimactic, as if the proposed technique didn't work very well—probably not what we want to signal.
    • CPU utilization is very high in this demo. While I'm not sure how much you can optimize, one quick improvement could be to stop training once a convergence criterium has been reached.
  2. Combine NN diagram and plot

Distill's current article layout (which we'll help you implement as a separate step) will allow you to bring your title above the distributions and combine the NN diagram with the distributions and some explanatory text.

We are happy to send some concrete design proposals, but I am currently unsure if there are more parameters that could be interesting to reveal to readers in such a combined hero diagram. Those parameters could be tweakable aspects of the optimization process, statistics over the optimization, allowing users to set some weights manually and see observe the outcome… we rely on your insight here to decide what may help the story. As a starting point: you mention that Bayesian NN "tell us how uncertain our prediction are"—is there a way to show that uncertainty in the hero diagram?

Another framing: help me immediately see at least some difference between Bayesian NN and vanilla FC NNs in your diagram.

jerryqhyu commented 6 years ago

Hello @ludwigschubert , sorry for the late response. The team is pretty occupied around these weeks. Indeed there is a distribution over weights of the neural net. To differentiate Bayesian from regular NN, the diagram ideally would show the distribution, but we have found it is slow to repaint/update the distribution frequently. The performance issue is also tied with the framework we are using. ConvNetJS is old and we had to do lots of modding to make it work. We'd love to have a more modern framework with things like autodiff, user-defined loss function and gpu acceleration. If you could recommend one it'd be great. We have had some ideas on how to improve this diagram that had to be shelved due to performance constraints, such as adding little distributions to the NN schema figure. Currently colour of the connection is variation and thickness is mean, but it's confusing to look at, we'll try to make it look like figure 1 of https://arxiv.org/pdf/1505.05424.pdf.

ludwigschubert commented 6 years ago

No worries ab out response timeframes—we're all trying our best.

In terms of JavaScript ML frameworks we've had some success with tensorflow.js.

Performance in web technologies can be tricky to get right. See if a different ML framework helps, and also feel free to reach out to us with a non-performant diagram in a branch. I can't promise we can help, but in the past we sometimes could. :-)

duvenaud commented 6 years ago

@ludwigschubert We've been thinking about your helpful suggestions. We are going to try switching to tensorflow.js, shrinking the net, and double-encoding the weights. One strategy we were thinking of to represent uncertainty over each weight was blurring each line in proportion to the uncertainty.

We are also thinking of comparing against standard NNs, as you suggested, by having two panes: The left pane would show a standard point estimate of a neural network with a single function through data. The right pane would should the BNN posterior. Do yo think it makes sense to use half the space on the baseline method?

One point I was hoping you could clarify: you said "Can we think of them as ten different networks? If so, show them as small multiples." Yes, we can can think of each sample from the posterior as a different network. What did you mean by "small multiples"?

ludwigschubert commented 6 years ago

If the comparison is important, feel free to use half the space for it! If focussing on the BNN posterior is also important, break it up into two diagrams. I always believe in introducing a concept/visual first, and then using it in more complex arrangements.

Small multiples simply means an aligned row or grid of similar visualizations to allow comparison. Here's an example:

In case of the NN weights, the samples from the posterior seemed like a natural choice to show—much easier to show individual points then a whole distribution.