Visualize LSTM gate activations

santi-pdp commented 8 years ago

Hello, is there any tool (or intention to add it) to visualize LSTM/GRU layer activations (not only outputs, also gates), similar to what they do in: http://research.microsoft.com/pubs/257676/SentenceEmbedding1502.06922v2.pdf ?

philipperemy commented 8 years ago

At least not from what I know. The code of interested is in recurrent.py. For the LSTM, you may look at this function and modify it:

recurrent.py, line 433
def step(self, x, states):
        assert len(states) == 2
        h_tm1 = states[0]
        c_tm1 = states[1]

        x_i = K.dot(x, self.W_i) + self.b_i
        x_f = K.dot(x, self.W_f) + self.b_f
        x_c = K.dot(x, self.W_c) + self.b_c
        x_o = K.dot(x, self.W_o) + self.b_o

        i = self.inner_activation(x_i + K.dot(h_tm1, self.U_i))
        f = self.inner_activation(x_f + K.dot(h_tm1, self.U_f))
        c = f * c_tm1 + i * self.activation(x_c + K.dot(h_tm1, self.U_c))
        o = self.inner_activation(x_o + K.dot(h_tm1, self.U_o))
        h = o * self.activation(c)
        return h, [h, c]

I think if you print manually the values, you may be able to reproduce a nice plot as the ones they used in the Microsoft Research paper.

For your information, you can retrieve the outputs of any later in your sequential model by creating a theano function:

def get_activations(model, layer, X_batch):
    get_activations = theano.function([model.layers[0].input], model.layers[layer].get_output(train=False), allow_input_downcast=True)
    activations = get_activations(X_batch) # same result as above
    return activations

Let me know.

gamehere007 commented 8 years ago

@philipperemy

Hi! Philip,

I try to use "get_activations" function for LSTM layer but I get a feedback as:

AttributeError: 'LSTM' object has no attribute 'get_output'

Any solution? Many thanks!

philipperemy commented 8 years ago

@gamehere007 @santi-pdp this should be what you want. I took the example stateful_lstm (cosinus) and I printed all the content of the LSTM step by step with a batch size of 1 (the code is not very clean but I modified only the relevant parts). Have a look of the console output:

Training
Epoch 0 / 25
Epoch 1/1
1000/1000 [==============================] - 12s - loss: 4574.5121    
states[0] = [[ 0.  0.]]
states[1] = [[ 257.03659058    0.        ]]
Input
b_i = [-0.00691508 -0.00629281]
W_i = [[ 1.02970016 -1.34588516]]
U_i = [[-1.03116369  0.36375973]
 [ 0.37897322  1.03080726]]
Forget
b_f = [ 0.99570334  1.00683689]
W_f = [[ 0.82403541 -0.9261207 ]]
U_f = [[-0.10786153  1.09928894]
 [ 1.09896088  0.10886104]]
Cell
b_c = [ 0.00170992  0.00342518]
W_c = [[ 1.36173594  0.7903558 ]]
U_c = [[-0.89767057  0.64024383]
 [ 0.63502508  0.90228355]]
Output
b_o = [-0.01488051 -0.0303614 ]
W_o = [[-0.71250075  0.33641183]]
U_o = [[-0.60409766  0.91694659]
 [ 0.93655741  0.62819171]]
output = [ 0.  0.]

If I remember well, B stands for bias, W for weights and U means the previous values (to be checked).

The snippet code is:

'''Example script showing how to use stateful RNNs
to model long sequences efficiently.
'''
from __future__ import print_function

import matplotlib.pyplot as plt
import numpy as np

from keras.layers import Dense, LSTM
from keras.models import Sequential

# since we are using stateful rnn tsteps can be set to 1
tsteps = 1
batch_size = 1
epochs = 25
# number of elements ahead that are used to make the prediction
lahead = 1

def gen_cosine_amp(amp=100, period=25, x0=0, xn=1000, step=1, k=0.0001):
    """Generates an absolute cosine time series with the amplitude
    exponentially decreasing

    Arguments:
        amp: amplitude of the cosine function
        period: period of the cosine function
        x0: initial x of the time series
        xn: final x of the time series
        step: step of the time series discretization
        k: exponential rate
    """
    cos = np.zeros(((xn - x0) * step, 1, 1))
    for i in range(len(cos)):
        idx = x0 + i * step
        cos[i, 0, 0] = amp * np.cos(idx / (2 * np.pi * period))
        cos[i, 0, 0] = cos[i, 0, 0] * np.exp(-k * idx)
    return cos

print('Generating Data')
cos = gen_cosine_amp()
print('Input shape:', cos.shape)

expected_output = np.zeros((len(cos), 1))
for i in range(len(cos) - lahead):
    expected_output[i, 0] = np.mean(cos[i + 1:i + lahead + 1])

print('Output shape')
print(expected_output.shape)

print('Creating Model')
model = Sequential()
model.add(LSTM(2,
               batch_input_shape=(batch_size, tsteps, 1),
               return_sequences=False,
               stateful=True))
model.add(Dense(1))
model.compile(loss='mse', optimizer='rmsprop')

# with a Sequential model
import keras.backend as K

get_LSTM_output = K.function([model.layers[0].input],
                             [model.layers[0].output])

print('Training')
for i in range(epochs):
    print('Epoch', i, '/', epochs)
    model.fit(cos,
              expected_output,
              batch_size=batch_size,
              verbose=1,
              nb_epoch=1,
              shuffle=False)

    for layer in model.layers:
        if 'LSTM' in str(layer):
            print('states[0] = {}'.format(K.get_value(layer.states[0])))
            print('states[1] = {}'.format(K.get_value(layer.states[1])))

            print('Input')
            print('b_i = {}'.format(K.get_value(layer.b_i)))
            print('W_i = {}'.format(K.get_value(layer.W_i)))
            print('U_i = {}'.format(K.get_value(layer.U_i)))

            print('Forget')
            print('b_f = {}'.format(K.get_value(layer.b_f)))
            print('W_f = {}'.format(K.get_value(layer.W_f)))
            print('U_f = {}'.format(K.get_value(layer.U_f)))

            print('Cell')
            print('b_c = {}'.format(K.get_value(layer.b_c)))
            print('W_c = {}'.format(K.get_value(layer.W_c)))
            print('U_c = {}'.format(K.get_value(layer.U_c)))

            print('Output')
            print('b_o = {}'.format(K.get_value(layer.b_o)))
            print('W_o = {}'.format(K.get_value(layer.W_o)))
            print('U_o = {}'.format(K.get_value(layer.U_o)))

    # output of the first batch value of the batch after the first fit().
    first_batch_element = np.expand_dims(cos[0], axis=1)  # (1, 1) to (1, 1, 1)
    print('output = {}'.format(get_LSTM_output([first_batch_element])[0].flatten()))

    model.reset_states()

print('Predicting')
predicted_output = model.predict(cos, batch_size=batch_size)

print('Ploting Results')
plt.subplot(2, 1, 1)
plt.plot(expected_output)
plt.title('Expected')
plt.subplot(2, 1, 2)
plt.plot(predicted_output)
plt.title('Predicted')
plt.show()

I'm using TensorFlow as backend.

I hope this helps!

gamehere007 commented 8 years ago

@philipperemy Hi! Phillip, thank you so much for sharing your code. But after I ran the code, I have a quesiton about the dimension of U_c, U_f, U_i and U_o matrices. Why these matrices I got from your code are all square matrices? According to my understanding of LSTM, taking U_i as an example, is that U_i should have a dimension of (# of hidden units, # of input features ). Thank you very much!

philipperemy commented 8 years ago

@gamehere007 Usually W means weights, b bias and U the recurrent weights.

You can find the shapes of each object here:

            self.W_i = self.init((self.input_dim, self.output_dim),
                                 name='{}_W_i'.format(self.name))
            self.U_i = self.inner_init((self.output_dim, self.output_dim),
                                       name='{}_U_i'.format(self.name))
            self.b_i = K.zeros((self.output_dim,), name='{}_b_i'.format(self.name))

            self.W_f = self.init((self.input_dim, self.output_dim),
                                 name='{}_W_f'.format(self.name))
            self.U_f = self.inner_init((self.output_dim, self.output_dim),
                                       name='{}_U_f'.format(self.name))
            self.b_f = self.forget_bias_init((self.output_dim,),
                                             name='{}_b_f'.format(self.name))

            self.W_c = self.init((self.input_dim, self.output_dim),
                                 name='{}_W_c'.format(self.name))
            self.U_c = self.inner_init((self.output_dim, self.output_dim),
                                       name='{}_U_c'.format(self.name))
            self.b_c = K.zeros((self.output_dim,), name='{}_b_c'.format(self.name))

            self.W_o = self.init((self.input_dim, self.output_dim),
                                 name='{}_W_o'.format(self.name))
            self.U_o = self.inner_init((self.output_dim, self.output_dim),
                                       name='{}_U_o'.format(self.name))
            self.b_o = K.zeros((self.output_dim,), name='{}_b_o'.format(self.name))

where

output_dim: dimension of the internal projections and the final output.
self.input_dim = input_shape[2] -- your third axis in (nb_samples, time_steps, input_dim)

In your case, hidden units = output_dim, input features = input_dim

AziziShekoofeh commented 7 years ago

Hi,

@philipperemy Thanks for the nice explanation. A simple question, what is the difference between states[0] and states[1]?

AziziShekoofeh commented 7 years ago

@philipperemy as I tracked the first formulation in the recurrent.py they are Ct-1 and ht-1, is it correct? So, basically, we also can have access to them by reconstructing the same network using layer.get_weights() and model.predict() for the given batch.

philipperemy commented 7 years ago

@AziziShekoofeh yes from what I remember it's your Ct-1 and ht-1.

They are not stored in the weights because those are not trainable parameters.

JiyuZhangBH commented 7 years ago

@philipperemy How to modify recurrent.py? When I run your code, it arises "'LSTM' object has no attribute ''b-i"

philipperemy commented 7 years ago

It's an underscore, not an hyphen!

JiyuZhangBH commented 7 years ago

@philipperemy Sorry, it's my own carelessness. But I rewrite the code to b_i but is still arises 'LSTM' object has no attribute ''b_i". I saw your answer to this question is to modify the step function of recurrent.py? How to modify the recurrent.py. Thanks a lot

philipperemy commented 7 years ago

@JiyuZhangBH they seemed to have changed the name of the variables.

Have a look at recurrent.py in keras.

    self.kernel_i = self.kernel[:, :self.units]
    self.kernel_f = self.kernel[:, self.units:self.units * 2]
    self.kernel_c = self.kernel[:, self.units * 2:self.units * 3]
    self.kernel_o = self.kernel[:, self.units * 3:]

    self.recurrent_kernel_i = self.recurrent_kernel[:, :self.units]
    self.recurrent_kernel_f = self.recurrent_kernel[:, self.units:
                                                    self.units * 2]
    self.recurrent_kernel_c = self.recurrent_kernel[:, self.units * 2:
                                                    self.units * 3]
    self.recurrent_kernel_o = self.recurrent_kernel[:, self.units * 3:]

    if self.use_bias:
      self.bias_i = self.bias[:self.units]
      self.bias_f = self.bias[self.units:self.units * 2]
      self.bias_c = self.bias[self.units * 2:self.units * 3]
      self.bias_o = self.bias[self.units * 3:]
    else:
      self.bias_i = None
      self.bias_f = None
      self.bias_c = None
      self.bias_o = None
    self.built = True

JiyuZhangBH commented 7 years ago

Should i change the 'bias_i' to 'b_i'? or kernel_i to b_i. In addition, whether the step function need to be changed?? ` def step(self, inputs, states): h_tm1 = states[0] c_tm1 = states[1] dp_mask = states[2] rec_dp_mask = states[3]

    if self.implementation == 2:
        z = K.dot(inputs * dp_mask[0], self.kernel)
        z += K.dot(h_tm1 * rec_dp_mask[0], self.recurrent_kernel)
        if self.use_bias:
            z = K.bias_add(z, self.bias)

        z0 = z[:, :self.units]
        z1 = z[:, self.units: 2 * self.units]
        z2 = z[:, 2 * self.units: 3 * self.units]
        z3 = z[:, 3 * self.units:]

        i = self.recurrent_activation(z0)
        f = self.recurrent_activation(z1)
        c = f * c_tm1 + i * self.activation(z2)
        o = self.recurrent_activation(z3)
    else:
        if self.implementation == 0:
            x_i = inputs[:, :self.units]
            x_f = inputs[:, self.units: 2 * self.units]
            x_c = inputs[:, 2 * self.units: 3 * self.units]
            x_o = inputs[:, 3 * self.units:]
        elif self.implementation == 1:
            x_i = K.dot(inputs * dp_mask[0], self.kernel_i) + self.bias_i
            x_f = K.dot(inputs * dp_mask[1], self.kernel_f) + self.bias_f
            x_c = K.dot(inputs * dp_mask[2], self.kernel_c) + self.bias_c
            x_o = K.dot(inputs * dp_mask[3], self.kernel_o) + self.bias_o
        else:
            raise ValueError('Unknown `implementation` mode.')

        i = self.recurrent_activation(x_i + K.dot(h_tm1 * rec_dp_mask[0],
                                                  self.recurrent_kernel_i))
        f = self.recurrent_activation(x_f + K.dot(h_tm1 * rec_dp_mask[1],
                                                  self.recurrent_kernel_f))
        c = f * c_tm1 + i * self.activation(x_c + K.dot(h_tm1 * rec_dp_mask[2],
                                                        self.recurrent_kernel_c))
        o = self.recurrent_activation(x_o + K.dot(h_tm1 * rec_dp_mask[3],
                                                  self.recurrent_kernel_o))
    h = o * self.activation(c)
    if 0 < self.dropout + self.recurrent_dropout:
        h._uses_learning_phase = True
    return h, [h, c]`

philipperemy commented 7 years ago

@JiyuZhangBH Do not modify recurrent.py in keras.

Modify the script I provided on May 29, 2016

JiyuZhangBH commented 7 years ago

@philipperemy Very thanks for your script. Except bias_i, I also want to print the values of the gate. That is the value of i,f,o,cand h in the step function of recurrent.py . Do you know how can I print them?

philipperemy commented 7 years ago

@JiyuZhangBH It should not be difficult. I really don't have time now! Sorry about that.

JiyuZhangBH commented 7 years ago

@philipperemy Very thanks for help!

vinayakumarr commented 7 years ago

from keras import backend as K

def get_activations(model, layer, X_batch): get_activations = K.function([model.layers[0].input, K.learning_phase()], model.layers[layer].output) activations = get_activations([X_batch,0]) print(activations) return activations

my_featuremaps = get_activations(cnn, 1, ([X_train[:10], 0])[0]) np.savetxt('featuremap.txt', my_featuremaps)

The above code is generating the below error with TensorFlow as backend

TypeError: outputs of a TensorFlow backend function should be a list or tuple.

Actually, this works fins with theano as backend

vinayakumarr commented 7 years ago

from keras import backend as K

def get_activations(model, layer, X_batch): get_activations = K.function([model.layers[0].input, K.learning_phase()], model.layers[layer].output) activations = get_activations([X_batch,0]) print(activations) return activations

my_featuremaps = get_activations(cnn, 1, ([X_train[:10], 0])[0]) np.savetxt('featuremap.txt', my_featuremaps)

The above code is generating the below error with TensorFlow as backend

TypeError: outputs of a TensorFlow backend function should be a list or tuple.

Actually, this works fins with theano as backend

AziziShekoofeh commented 7 years ago

@vinayakumarr I think it's a version problem between Keras 1.x and 2.x in network definition. Check the version of TF and Keras.

elhenceashima commented 6 years ago

@JiyuZhangBH I am facing the same problem as you. LSTM object has no attribute 'b_i and also no bias_i Were you able to solve that problem of yours. Kindly let me know how you solved this issue. Thanks in advance.

elhenceashima commented 6 years ago

@philipperemy on checking the latest recurrent.py in keras no step function definition could be found. Any help in this regard would be requested since for the above code given by you, even bias_i is not working anymore.

MirunaPislar commented 6 years ago

@elhenceashima if you still have this problem

Keras 2.1.2. This is (what I think is) the updated version of @philipperemy code above that worked for me:

for layer in model.layers:
    if 'LSTM' in str(layer):
        weights = layer.get_weights()

        print('Previous memory state states[0] = {}'.format(K.get_value(layer.states[0])))
        print('Previous carry state states[1] = {}'.format(K.get_value(layer.states[1])))

        print('Input')
        print('bias_i = {}'.format(K.get_value(layer.cell.bias_i)))
        print('kernel_i = {}'.format(K.get_value(layer.cell.kernel_i)))
        print('recurrent_kernel_i = {}'.format(K.get_value(layer.cell.recurrent_kernel_i)))

        print('Forget')
        print('bias_f = {}'.format(K.get_value(layer.cell.bias_f)))
        print('kernel_f = {}'.format(K.get_value(layer.cell.kernel_f)))
        print('recurrent_kernel_f = {}'.format(K.get_value(layer.cell.recurrent_kernel_f)))

        print('Cell')
        print('bias_c = {}'.format(K.get_value(layer.cell.bias_c)))
        print('kernel_c = {}'.format(K.get_value(layer.cell.kernel_c)))
        print('recurrent_kernel_c = {}'.format(K.get_value(layer.cell.recurrent_kernel_c)))

        print('Output')
        print('bias_o = {}'.format(K.get_value(layer.cell.bias_o)))
        print('kernel_o = {}'.format(K.get_value(layer.cell.kernel_o)))
        print('recurrent_kernel_o = {}'.format(K.get_value(layer.cell.recurrent_kernel_o)))

Check that you have set use_bias=True in the LSTM layer if you want to be able to get the biases.

Alternatively, one can use layer.get_weights() to get the same results, as used in [https://github.com/keras-team/keras/issues/3088]

for layer in model.layers:
    if 'LSTM' in str(layer):
        weights = layer.get_weights()

        for e in zip(layer.trainable_weights, layer.get_weights()):
            print('Param\n%s:\n%s' % (e[0], e[1]))

i.e. With the first implementation you get:

Input bias_i = [-0.05891408 -0.01085878] kernel_i = [[-0.19997048 0.7560412 ]] recurrent_kernel_i = [[ 0.03347364 0.69555515] [-0.05277003 0.16991724]] Forget bias_f = [0.99022233 0.99371547] kernel_f = [[-0.73273 0.46553144]] recurrent_kernel_f = [[-0.430623 0.53221387] [ 0.69555295 0.36536166]] Cell bias_c = [-0.05549829 -0.00565466] kernel_c = [[ 0.6521949 -0.36162028]] recurrent_kernel_c = [[ 0.07719237 0.15796313] [-0.17077188 0.20027691]] Output bias_o = [-0.05292185 -0.02267806] kernel_o = [[ 0.3093674 -0.61787367]] recurrent_kernel_o = [[-0.10132004 -0.16078967] [-0.3530063 0.39258108]]

while with the second one: Param <tf.Variable 'lstm_1/kernel:0' shape=(1, 8) dtype=float32_ref>: [[-0.19997048 0.7560412 -0.73273 0.46553144 0.6521949 -0.36162028 0.3093674 -0.61787367]] # these are just kernel_i, kernel_f, kernel_c and kernel_o in order Param <tf.Variable 'lstm_1/recurrent_kernel:0' shape=(2, 8) dtype=float32_ref>: [[ 0.03347364 0.69555515 -0.430623 0.53221387 0.07719237 0.15796313 -0.10132004 -0.16078967] [-0.05277003 0.16991724 0.69555295 0.36536166 -0.17077188 0.20027691 -0.3530063 0.39258108]] Param <tf.Variable 'lstm_1/bias:0' shape=(8,) dtype=float32_ref>: [-0.05891408 -0.01085878 0.99022233 0.99371547 -0.05549829 -0.00565466 -0.05292185 -0.02267806]

philipperemy commented 6 years ago

@MirunaPislar thanks for updating my code. I'm happy it works for you!

darrahts commented 1 year ago

I know this is closed but I still get an error when trying to access the weights and bias. I've tried 'b_i' as well as 'bias_i'

print(type(model.layers[1]))
print(type(model.layers[1].layer))
K.get_value(model.layers[1].layer.bias_i)

<class 'keras.layers.wrappers.Bidirectional'>
<class 'keras.layers.recurrent_v2.LSTM'>

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_206688/2076287561.py in <module>
      1 print(type(model.layers[1]))
      2 print(type(model.layers[1].layer))
----> 3 K.get_value(model.layers[1].layer.bias_i)

AttributeError: 'LSTM' object has no attribute 'bias_i'

keras-team / keras

Visualize LSTM gate activations #1922