richliao / textClassifier

Text classifier for Hierarchical Attention Networks for Document Classification
Apache License 2.0
1.07k stars 379 forks source link

Code to Visualize Attention Weights #7

Open ni9elf opened 7 years ago

ni9elf commented 7 years ago

Need some help in writing the code to obtain and visualize the attention weights like that in the HAN paper (heat map). To obtain the attention weights, I'm currently thinking of obtaining the hidden representations of the GRUs (h_it) and then manually using h_it to compute the attention weights using the equations from the call function of the attention layer.

layer_name = 'GRU' intermediate_layer_model = Model(input=model.input, output=model.get_layer(layer_name).output) intermediate_output = intermediate_layer_model.predict(input_variable) h_it = intermediate_output #use h_it from above to compute attention weights

If there is a more direct way (direct function call in Keras or some existing code available), it will be helpful.

spate141 commented 6 years ago

Any update on visualization part? @ni9elf @richliao I'm trying to get the weights, but it always end up to 1.0

> _dot = np.dot(out[0], att_w[0])
> _tanh = np.tanh(_dot)
> _exp = np.exp(_tanh)
> weights = _exp / np.sum(_exp)
> weights
array([1.], dtype=float32)

Here, out[0] and att_w[0] are my output layer and attention layer weights for the given sentence respectively. Any thoughts?

deepankar27 commented 6 years ago

@spate141 @ni9elf @richliao Do you people have any update on the visualization part, can you guys help on this?

spate141 commented 6 years ago

@deepankar27 the closest working solution I got is this: https://github.com/cbaziotis/neat-vision

deepankar27 commented 6 years ago

@spate141 Thanks!! looks like a nice tool but how will I feed my model's attention values to it along with predicted model score for each label but how can I get the attention values? From where will I get this att_w[0] while predicting a label? It could be a stupid question....

spate141 commented 6 years ago

@deepankar27 If your model has attention layer, you can easily get the output of that layer. With Keras, you can get it like this; Obtain the output of an intermediate layer with Keras

deepankar27 commented 6 years ago

@spate141 Awesome!! Thanks a lot!!

arunarn2 commented 5 years ago

@spate141 @deepankar27 @richliao: I am still having issues with capturing the attention weights. I believe I am getting the wts using att_w = model.get_layer('hierarchical_attn_2').get_weights() Now this is a list of lists of shape [3,200,200]. Should this wt matrix be shaped? Can you provide any assistance on how I translate this to the weights for my incoming text?

robin-fusemachines commented 5 years ago

first make these changes in that Attention Class to visualize sentence level weights. i haven't taken time to account for that word level attention visualization though.

class AttLayer(Layer):
    def __init__(self,attention_dim,**kwargs):
        self.init = initializers.get('normal')
        self.supports_masking = True
        self.attention_dim = attention_dim
        super(AttLayer, self).__init__(**kwargs)

    def build(self, input_shape):
        assert len(input_shape) == 3
        self.W = K.variable(self.init((input_shape[-1], self.attention_dim)),name='Attention_weight' )
        self.b = K.variable(self.init((self.attention_dim, )),name = 'Attention_Bias' )
        self.u = K.variable(self.init((self.attention_dim, 1)),name = 'Attention_power')
        self.trainable_weights = [self.W, self.b, self.u]
        super(AttLayer, self).build(input_shape)

    def compute_mask(self, inputs, mask=None):
        return None

    def call(self, x, mask=None):
        # size of x :[batch_size, sel_len, attention_dim]
        # size of u :[batch_size, attention_dim]
        # uit = tanh(xW+b)
        uit = K.tanh(K.bias_add(K.dot(x, self.W), self.b))
        ait = K.dot(uit, self.u)
        ait = K.squeeze(ait, -1)

        ait = K.exp(ait)

        if mask is not None:
            # Cast the mask to floatX to avoid float64 upcasting in theano
            ait *= K.cast(mask, K.floatx())
        ait /= K.cast(K.sum(ait, axis=1, keepdims=True) + K.epsilon(), K.floatx())
        print(ait.shape,'is the shape')
        ait = K.expand_dims(ait)
        weighted_input = x * ait
#         print(weig)
        output = K.sum(weighted_input, axis=1)
        return output

    def compute_output_shape(self, input_shape):
        return (input_shape[0], input_shape[-1])

    def _get_attention_weights(self, X):

        uit = K.tanh(K.bias_add(K.dot(X, self.W), self.b))
        ait = K.dot(uit, self.u)
        ait = K.squeeze(ait, -1)
        ait = K.exp(ait)
        ait /= K.cast(K.sum(ait, axis=1, keepdims=True) + K.epsilon(), K.floatx())
        ait = K.expand_dims(ait)
        weighted_input = X * ait
        return ait

i am assuming the code below as continuation after richloao's code for model fitting.

later after training the model, you can infer the attention weights for your testing data as :

from keras.layers import Lambda
att_layer = model.get_layer('sentence_attention')
prev_tensor = att_layer.input
dummy_layer = Lambda(
          lambda x: att_layer._get_attention_weights(x)
      )(prev_tensor)

from keras.models import Model
attention_weights = Model(model.input, dummy_layer).predict(x_val)

## shape of above matrix  is : (size of validation set, MAX_SENTS, 1 
## that means, for each sentence, we get the attention ranking for dimension 1, obviously :D)

Note, i used x_val, but, try dividing data into train, test and val set. unlike richliao's train and val set only. Then, visualize the sentence weights.

mail me at : robinnarsingha123@gmail.com , i will share complete extension to this code to save the model later on, visualize weights and stuffs :D