tensorflow / lucid

A collection of infrastructure and tools for research in neural network interpretability.
Apache License 2.0
4.63k stars 652 forks source link

Research: Feature Visualization Objectives #116

Open colah opened 5 years ago

colah commented 5 years ago

🔬 This is an experiment in doing radically open research. I plan to post all my work on this openly as I do it, tracking it in this issue. I'd love for people to comment, or better yet collaborate! See more.

Please be respectful of the fact that this is unpublished research and that people involved in this are putting themselves in an unusually vulnerable position. Please treat it as you would unpublished work described in a seminar or by a colleague.

⚙️ This is a bit of a more low-level and technical research issue than many of the others. It might feel a bit in the weeds, but making progress on it would give us lots of powerful traction on basically everything else.

Description

Feature Visualization studies neural network behavior by optimizing an input to trigger a particular input.

For example, to visualize a neuron, we create an input which strongly causes the neuron to fire. We can also visualize a combination of neurons by maximizing the amount the sum fires. These visualizations have a nice geometric interpretation: we are visualizing a direction in a vector space of activations, where each neuron is a basis.

We normally do this by maximizing that direction, that is maximizing the dot product of our activation vector with the the desired direction vector. However...

Maximizing dot product may not be the right objective

There are a number of reasons why just maximizing a direction in this way may not actually be the thing we want, at least in some cases:

(An additional reason we might want to do something different is that, even when normal feature visualization works perfectly, it doesn't differentiate between things that strongly help activate the direction and things that only slightly do..).

Alternate Visualization Objectives

There are many other visualization objectives we could try. (Note, there might not be a single correct one -- they may all show us different things.)

Are we sure there's a problem?

The main things pointing towards there being an issue are:

These could be explained in different ways, but generally suggest we should think hard both about the directions we're visualizing and the objectives we're using to visualize them.

(A final, more fatal error could be that directions aren't the right thing to try to understand at all. None of these observations really implicate that at this point.)

Resources

Dot x Cosine Similarity

See, for example, this notebook on caricatures.

Penalizing activations at previous layer

obj  = objectives.neuron("mixed4d", 504)
obj += -1e-4*objectives.L1("mixed4a")        # penalize earlier layer
param_f = lambda: param.image(160)
_ = render.render_vis(model, obj, param_f)

image

colah commented 5 years ago

Comment from Yasaman Bahri (@yasamanb): maybe the reason we see poly-semantic neurons is that the task isn't hard enough to get neurons in later layers to learn the "right" abstractions. In early layers, when you're closer to the data, perhaps it is easier. (comment paraphrase by Chris, may not be a super accurate interpretation of Yasaman's remark.)

nareshshah139 commented 5 years ago

Hey there, we are looking at these objectives with a new perspective of tying them to uncertainty estimation within a deep neural network. If an activation vectors is far from all seen activation vectors then its an outlier. If an activation vector is equally similar to the centroid of two classes then its a point close to the boundary between the two classes. Early results show that this method differs from the prediction probability at the end of a softmax and is better for some of the deeper/more complex networks I have experimented with.

yashpatel5400 commented 4 years ago

This sounds super neat! What's the status on this project (considering it's been over half a year since this issue was explicitly talked over)? If it's still in the works, is the main work to be done with respect to looking at different objectives for visualization or something else?

mathemaphysics commented 4 years ago

Sounds like a dual vector space method might be useful if a transformation can be used to "unstretch" the space.

mathemaphysics commented 4 years ago

@colah Maybe I'm oversimplifying, but each abstraction, i.e. each layer away from the input we would desire a generalized representation of the data, i.e. a many-to-one correspondence between input configurations and abstract neuron activations. If we're classifying objects, we're actually stipulating this intentionally.

The real question is how "quickly" learning algorithms can separate classes. There's an obvious linear algebra angle here which will almost certainly relate to the rank and condition numbers of successive weight matrices (because there are biases too, I guess these would be a fine transformations rather than "linear").