Open colah opened 5 years ago
Some random interesting things:
I've been thinking about "attribution caricatures" a lot more. See examples and notebook.
Attribution caricatures can be made to the output classes, as we saw earlier:
But they can also be done to a hidden layer, creating a caricature at one layer that emphasizes features that will be important at a later layer:
An idea related to this "iterated attribution" -- apply attribution iteratively to each layer between a start and end point. It's not clear this is principled, but the results seem interesting:
I found an interesting example of how attribution caricature perceive a bookshelf as different classes. View here.
🔬 This is an experiment in doing radically open research. I plan to post all my work on this openly as I do it, tracking it in this issue. I'd love for people to comment, or better yet collaborate! See more.
Please be respectful of the fact that this is unpublished research and that people involved in this are putting themselves in an unusually vulnerable position. Please treat it as you would unpublished work described in a seminar or by a colleague.
Description
Caricatures are a powerful feature visualization technique that we haven't fully explored or published on yet. Roughly, they allow us to take an input image, feed it through to some layer of a network, and get a sense of how the network understood it.
Caircatures do this by creating a new image that has a similar but more extreme activation pattern to the original at a given layer.
There are two related properties that make caricatures really interesting as a visualization:
They are basis-free visualizations. Unlike neuron visualizations, where which neurons you pick dramatically effects the results, and rotating activation space would dramatically change things, caricatures are unaffected. This means they work well even for models where concepts don't align with neurons.
They are comparable visualizations. Most visualizations we have are not comparable between models. For example, if you visualize a neuron in one model, and another in a different model, there's no reason for them to represent the same thing and you learn little about how the models compare. While there are other comparable visualizations, caricatures are by far the simplest ones.
This makes caricatures a really important technique! This is because
they are our first, simplest line of attack on model comparison
they are a super useful tool for debugging feature visualization when it doesn’t work (because they remove neuron choice as a potential problem).
Next Steps
Caricatures are much more powerful when shown in context, as demonstrated at the top of this notebook. It would be great to scale this!
It would be super excited to do more controlled experiments of changing network architectures and see how the caricatures respond. (The models would also be a useful resources to have for future model comparison work.) The one I'm most immediately excited about is exploring network branches, the effect of data sets, and preprocessing.
We've recently had some early exciting results about "attributive caricatures" which might be interesting to explore:
It might be useful to show how they can be used for debugging feature vis.