raghakot / keras-vis

Neural network visualization toolkit for keras
https://raghakot.github.io/keras-vis
MIT License
2.98k stars 660 forks source link

Issue (presumably) with embedding layers #63

Open pexmar opened 7 years ago

pexmar commented 7 years ago

Hi,

directly beforehand: I use the most recent version of keras-vis, keras, theano, and tensorflow.

I really like the idea of keras-vis, but by testing it on an NLP-task I came across with an error:

when I try to run visualize_saliency(model, layer_idx, filter_indices=0, seed_input=txt_instances.x[0]) on my trained model, then I get a theano.gradient.DisconnectedInputError error in this line. Do you have an idea why?

The exact error message is the following:

Traceback (most recent call last): File "/Users/peter/Masterarbeit/python-projects/dscnn-keras/keras-vis-gist.py", line 285, in heatmap = visualize_saliency(model, layer_idx, filter_indices=0, seed_input=txt_instances.x[0]) File "/Users/peter/tensorflow/lib/python3.5/site-packages/vis/visualization/saliency.py", line 125, in visualize_saliency return visualize_saliency_with_losses(model.input, losses, seed_input, grad_modifier) File "/Users/peter/tensorflow/lib/python3.5/site-packages/vis/visualization/saliency.py", line 72, in visualize_saliency_with_losses opt = Optimizer(input_tensor, losses, norm_grads=False) File "/Users/peter/tensorflow/lib/python3.5/site-packages/vis/optimizer.py", line 52, in init grads = K.gradients(overall_loss, self.wrt_tensor)[0] File "/Users/peter/tensorflow/lib/python3.5/site-packages/keras/backend/theano_backend.py", line 1172, in gradients return T.grad(loss, variables) File "/Users/peter/tensorflow/lib/python3.5/site-packages/theano/gradient.py", line 539, in grad handle_disconnected(elem) File "/Users/peter/tensorflow/lib/python3.5/site-packages/theano/gradient.py", line 526, in handle_disconnected raise DisconnectedInputError(message) theano.gradient.DisconnectedInputError:
Backtrace when that variable is created:

File "/Users/peter/tensorflow/lib/python3.5/site-packages/keras/engine/topology.py", line 2416, in from_config process_layer(layer_data) File "/Users/peter/tensorflow/lib/python3.5/site-packages/keras/engine/topology.py", line 2385, in process_layer custom_objects=custom_objects) File "/Users/peter/tensorflow/lib/python3.5/site-packages/keras/layers/init.py", line 54, in deserialize printable_module_name='layer') File "/Users/peter/tensorflow/lib/python3.5/site-packages/keras/utils/generic_utils.py", line 141, in deserialize_keras_object return cls.from_config(config['config']) File "/Users/peter/tensorflow/lib/python3.5/site-packages/keras/engine/topology.py", line 1231, in from_config return cls(*config) File "/Users/peter/tensorflow/lib/python3.5/site-packages/keras/legacy/interfaces.py", line 88, in wrapper return func(args, **kwargs) File "/Users/peter/tensorflow/lib/python3.5/site-packages/keras/engine/topology.py", line 1325, in init name=self.name) File "/Users/peter/tensorflow/lib/python3.5/site-packages/keras/backend/theano_backend.py", line 185, in placeholder x = T.TensorType(dtype, broadcast)(name)

I already tried to make the embedding layer trainable (as suggested on slack), that did not solve the problem. I prepared a Gist, you will need the english word2vec file that is provided by google (https://code.google.com/archive/p/word2vec/):

https://gist.github.com/pexmar/8900cb65f4970bd911ebc81206c9b131

Thanks in advance for your help and thanks for this great library ;-)

raghakot commented 7 years ago

So the issue is that gradients are not propagated through Embedding layer which uses tf.gather op. This means that we need to compute gradients with respect to embedding layer output instead of model input. I have updated the API to allow optional wrt_tensor argument (https://github.com/raghakot/keras-vis/commit/0c57850db2ba95905f4ff7b38a685bc6d0d38087)

If you pass wrt_tensor = model.layers[1].output to visualize_saliency, you should get a heatmap. Note that I have recently made a change so that various saliency visualizations return raw gradients instead of jet color mapped heatmap.

So, this heatmap should be of shape (400,). You can convert it into a proper heatmap using:

import matplotlib.cm as cm
hmap = np.uint8(cm.jet(grads)[..., :3] * 255)[0]

hmap will have shape (400, 3) with 3 channels indicating the rgb values for the heatmap.

Traditionally saliency takes max value across all channels (which in case of images is 3). In this case, however, it is taking max across all 60 dimensions (emb_size), which maynot be a good idea. I could imagine np.mean being better in this case. Alternatively, you can directly plot (400, 60) heatmap as a 2D image. To do that, you can literally copy paste this code (https://github.com/raghakot/keras-vis/blob/master/vis/visualization/saliency.py#L79) and comment out the np.max part.

Let me know how it goes, and if it seems to make sense at all.