Closed santi-pdp closed 7 years ago
At least not from what I know. The code of interested is in recurrent.py. For the LSTM, you may look at this function and modify it:
recurrent.py, line 433
def step(self, x, states):
assert len(states) == 2
h_tm1 = states[0]
c_tm1 = states[1]
x_i = K.dot(x, self.W_i) + self.b_i
x_f = K.dot(x, self.W_f) + self.b_f
x_c = K.dot(x, self.W_c) + self.b_c
x_o = K.dot(x, self.W_o) + self.b_o
i = self.inner_activation(x_i + K.dot(h_tm1, self.U_i))
f = self.inner_activation(x_f + K.dot(h_tm1, self.U_f))
c = f * c_tm1 + i * self.activation(x_c + K.dot(h_tm1, self.U_c))
o = self.inner_activation(x_o + K.dot(h_tm1, self.U_o))
h = o * self.activation(c)
return h, [h, c]
I think if you print manually the values, you may be able to reproduce a nice plot as the ones they used in the Microsoft Research paper.
For your information, you can retrieve the outputs of any later in your sequential model by creating a theano function:
def get_activations(model, layer, X_batch):
get_activations = theano.function([model.layers[0].input], model.layers[layer].get_output(train=False), allow_input_downcast=True)
activations = get_activations(X_batch) # same result as above
return activations
Let me know.
@philipperemy
Hi! Philip,
I try to use "get_activations" function for LSTM layer but I get a feedback as:
AttributeError: 'LSTM' object has no attribute 'get_output'
Any solution? Many thanks!
@gamehere007 @santi-pdp this should be what you want. I took the example stateful_lstm (cosinus) and I printed all the content of the LSTM step by step with a batch size of 1 (the code is not very clean but I modified only the relevant parts). Have a look of the console output:
Training
Epoch 0 / 25
Epoch 1/1
1000/1000 [==============================] - 12s - loss: 4574.5121
states[0] = [[ 0. 0.]]
states[1] = [[ 257.03659058 0. ]]
Input
b_i = [-0.00691508 -0.00629281]
W_i = [[ 1.02970016 -1.34588516]]
U_i = [[-1.03116369 0.36375973]
[ 0.37897322 1.03080726]]
Forget
b_f = [ 0.99570334 1.00683689]
W_f = [[ 0.82403541 -0.9261207 ]]
U_f = [[-0.10786153 1.09928894]
[ 1.09896088 0.10886104]]
Cell
b_c = [ 0.00170992 0.00342518]
W_c = [[ 1.36173594 0.7903558 ]]
U_c = [[-0.89767057 0.64024383]
[ 0.63502508 0.90228355]]
Output
b_o = [-0.01488051 -0.0303614 ]
W_o = [[-0.71250075 0.33641183]]
U_o = [[-0.60409766 0.91694659]
[ 0.93655741 0.62819171]]
output = [ 0. 0.]
If I remember well, B stands for bias, W for weights and U means the previous values (to be checked).
The snippet code is:
'''Example script showing how to use stateful RNNs
to model long sequences efficiently.
'''
from __future__ import print_function
import matplotlib.pyplot as plt
import numpy as np
from keras.layers import Dense, LSTM
from keras.models import Sequential
# since we are using stateful rnn tsteps can be set to 1
tsteps = 1
batch_size = 1
epochs = 25
# number of elements ahead that are used to make the prediction
lahead = 1
def gen_cosine_amp(amp=100, period=25, x0=0, xn=1000, step=1, k=0.0001):
"""Generates an absolute cosine time series with the amplitude
exponentially decreasing
Arguments:
amp: amplitude of the cosine function
period: period of the cosine function
x0: initial x of the time series
xn: final x of the time series
step: step of the time series discretization
k: exponential rate
"""
cos = np.zeros(((xn - x0) * step, 1, 1))
for i in range(len(cos)):
idx = x0 + i * step
cos[i, 0, 0] = amp * np.cos(idx / (2 * np.pi * period))
cos[i, 0, 0] = cos[i, 0, 0] * np.exp(-k * idx)
return cos
print('Generating Data')
cos = gen_cosine_amp()
print('Input shape:', cos.shape)
expected_output = np.zeros((len(cos), 1))
for i in range(len(cos) - lahead):
expected_output[i, 0] = np.mean(cos[i + 1:i + lahead + 1])
print('Output shape')
print(expected_output.shape)
print('Creating Model')
model = Sequential()
model.add(LSTM(2,
batch_input_shape=(batch_size, tsteps, 1),
return_sequences=False,
stateful=True))
model.add(Dense(1))
model.compile(loss='mse', optimizer='rmsprop')
# with a Sequential model
import keras.backend as K
get_LSTM_output = K.function([model.layers[0].input],
[model.layers[0].output])
print('Training')
for i in range(epochs):
print('Epoch', i, '/', epochs)
model.fit(cos,
expected_output,
batch_size=batch_size,
verbose=1,
nb_epoch=1,
shuffle=False)
for layer in model.layers:
if 'LSTM' in str(layer):
print('states[0] = {}'.format(K.get_value(layer.states[0])))
print('states[1] = {}'.format(K.get_value(layer.states[1])))
print('Input')
print('b_i = {}'.format(K.get_value(layer.b_i)))
print('W_i = {}'.format(K.get_value(layer.W_i)))
print('U_i = {}'.format(K.get_value(layer.U_i)))
print('Forget')
print('b_f = {}'.format(K.get_value(layer.b_f)))
print('W_f = {}'.format(K.get_value(layer.W_f)))
print('U_f = {}'.format(K.get_value(layer.U_f)))
print('Cell')
print('b_c = {}'.format(K.get_value(layer.b_c)))
print('W_c = {}'.format(K.get_value(layer.W_c)))
print('U_c = {}'.format(K.get_value(layer.U_c)))
print('Output')
print('b_o = {}'.format(K.get_value(layer.b_o)))
print('W_o = {}'.format(K.get_value(layer.W_o)))
print('U_o = {}'.format(K.get_value(layer.U_o)))
# output of the first batch value of the batch after the first fit().
first_batch_element = np.expand_dims(cos[0], axis=1) # (1, 1) to (1, 1, 1)
print('output = {}'.format(get_LSTM_output([first_batch_element])[0].flatten()))
model.reset_states()
print('Predicting')
predicted_output = model.predict(cos, batch_size=batch_size)
print('Ploting Results')
plt.subplot(2, 1, 1)
plt.plot(expected_output)
plt.title('Expected')
plt.subplot(2, 1, 2)
plt.plot(predicted_output)
plt.title('Predicted')
plt.show()
I'm using TensorFlow as backend.
I hope this helps!
@philipperemy Hi! Phillip, thank you so much for sharing your code. But after I ran the code, I have a quesiton about the dimension of U_c, U_f, U_i and U_o matrices. Why these matrices I got from your code are all square matrices? According to my understanding of LSTM, taking U_i as an example, is that U_i should have a dimension of (# of hidden units, # of input features ). Thank you very much!
@gamehere007 Usually W means weights, b bias and U the recurrent weights.
You can find the shapes of each object here:
self.W_i = self.init((self.input_dim, self.output_dim),
name='{}_W_i'.format(self.name))
self.U_i = self.inner_init((self.output_dim, self.output_dim),
name='{}_U_i'.format(self.name))
self.b_i = K.zeros((self.output_dim,), name='{}_b_i'.format(self.name))
self.W_f = self.init((self.input_dim, self.output_dim),
name='{}_W_f'.format(self.name))
self.U_f = self.inner_init((self.output_dim, self.output_dim),
name='{}_U_f'.format(self.name))
self.b_f = self.forget_bias_init((self.output_dim,),
name='{}_b_f'.format(self.name))
self.W_c = self.init((self.input_dim, self.output_dim),
name='{}_W_c'.format(self.name))
self.U_c = self.inner_init((self.output_dim, self.output_dim),
name='{}_U_c'.format(self.name))
self.b_c = K.zeros((self.output_dim,), name='{}_b_c'.format(self.name))
self.W_o = self.init((self.input_dim, self.output_dim),
name='{}_W_o'.format(self.name))
self.U_o = self.inner_init((self.output_dim, self.output_dim),
name='{}_U_o'.format(self.name))
self.b_o = K.zeros((self.output_dim,), name='{}_b_o'.format(self.name))
where
output_dim: dimension of the internal projections and the final output.
self.input_dim = input_shape[2] -- your third axis in (nb_samples, time_steps, input_dim)
In your case, hidden units = output_dim, input features = input_dim
Hi,
@philipperemy Thanks for the nice explanation. A simple question, what is the difference between states[0] and states[1]?
@philipperemy as I tracked the first formulation in the recurrent.py they are Ct-1 and ht-1, is it correct? So, basically, we also can have access to them by reconstructing the same network using layer.get_weights() and model.predict() for the given batch.
@AziziShekoofeh yes from what I remember it's your Ct-1 and ht-1.
They are not stored in the weights because those are not trainable parameters.
@philipperemy How to modify recurrent.py? When I run your code, it arises "'LSTM' object has no attribute ''b-i"
It's an underscore, not an hyphen!
@philipperemy Sorry, it's my own carelessness. But I rewrite the code to b_i but is still arises 'LSTM' object has no attribute ''b_i". I saw your answer to this question is to modify the step function of recurrent.py? How to modify the recurrent.py. Thanks a lot
@JiyuZhangBH they seemed to have changed the name of the variables.
Have a look at recurrent.py in keras.
self.kernel_i = self.kernel[:, :self.units]
self.kernel_f = self.kernel[:, self.units:self.units * 2]
self.kernel_c = self.kernel[:, self.units * 2:self.units * 3]
self.kernel_o = self.kernel[:, self.units * 3:]
self.recurrent_kernel_i = self.recurrent_kernel[:, :self.units]
self.recurrent_kernel_f = self.recurrent_kernel[:, self.units:
self.units * 2]
self.recurrent_kernel_c = self.recurrent_kernel[:, self.units * 2:
self.units * 3]
self.recurrent_kernel_o = self.recurrent_kernel[:, self.units * 3:]
if self.use_bias:
self.bias_i = self.bias[:self.units]
self.bias_f = self.bias[self.units:self.units * 2]
self.bias_c = self.bias[self.units * 2:self.units * 3]
self.bias_o = self.bias[self.units * 3:]
else:
self.bias_i = None
self.bias_f = None
self.bias_c = None
self.bias_o = None
self.built = True
Should i change the 'bias_i' to 'b_i'? or kernel_i to b_i. In addition, whether the step function need to be changed?? ` def step(self, inputs, states): h_tm1 = states[0] c_tm1 = states[1] dp_mask = states[2] rec_dp_mask = states[3]
if self.implementation == 2:
z = K.dot(inputs * dp_mask[0], self.kernel)
z += K.dot(h_tm1 * rec_dp_mask[0], self.recurrent_kernel)
if self.use_bias:
z = K.bias_add(z, self.bias)
z0 = z[:, :self.units]
z1 = z[:, self.units: 2 * self.units]
z2 = z[:, 2 * self.units: 3 * self.units]
z3 = z[:, 3 * self.units:]
i = self.recurrent_activation(z0)
f = self.recurrent_activation(z1)
c = f * c_tm1 + i * self.activation(z2)
o = self.recurrent_activation(z3)
else:
if self.implementation == 0:
x_i = inputs[:, :self.units]
x_f = inputs[:, self.units: 2 * self.units]
x_c = inputs[:, 2 * self.units: 3 * self.units]
x_o = inputs[:, 3 * self.units:]
elif self.implementation == 1:
x_i = K.dot(inputs * dp_mask[0], self.kernel_i) + self.bias_i
x_f = K.dot(inputs * dp_mask[1], self.kernel_f) + self.bias_f
x_c = K.dot(inputs * dp_mask[2], self.kernel_c) + self.bias_c
x_o = K.dot(inputs * dp_mask[3], self.kernel_o) + self.bias_o
else:
raise ValueError('Unknown `implementation` mode.')
i = self.recurrent_activation(x_i + K.dot(h_tm1 * rec_dp_mask[0],
self.recurrent_kernel_i))
f = self.recurrent_activation(x_f + K.dot(h_tm1 * rec_dp_mask[1],
self.recurrent_kernel_f))
c = f * c_tm1 + i * self.activation(x_c + K.dot(h_tm1 * rec_dp_mask[2],
self.recurrent_kernel_c))
o = self.recurrent_activation(x_o + K.dot(h_tm1 * rec_dp_mask[3],
self.recurrent_kernel_o))
h = o * self.activation(c)
if 0 < self.dropout + self.recurrent_dropout:
h._uses_learning_phase = True
return h, [h, c]`
@JiyuZhangBH Do not modify recurrent.py in keras.
Modify the script I provided on May 29, 2016
@philipperemy Very thanks for your script. Except bias_i, I also want to print the values of the gate. That is the value of i
,f
,o
,c
and h
in the step function of recurrent.py . Do you know how can I print them?
@JiyuZhangBH It should not be difficult. I really don't have time now! Sorry about that.
@philipperemy Very thanks for help!
from keras import backend as K
def get_activations(model, layer, X_batch): get_activations = K.function([model.layers[0].input, K.learning_phase()], model.layers[layer].output) activations = get_activations([X_batch,0]) print(activations) return activations
my_featuremaps = get_activations(cnn, 1, ([X_train[:10], 0])[0]) np.savetxt('featuremap.txt', my_featuremaps)
The above code is generating the below error with TensorFlow as backend
TypeError: outputs of a TensorFlow backend function should be a list or tuple.
Actually, this works fins with theano as backend
from keras import backend as K
def get_activations(model, layer, X_batch): get_activations = K.function([model.layers[0].input, K.learning_phase()], model.layers[layer].output) activations = get_activations([X_batch,0]) print(activations) return activations
my_featuremaps = get_activations(cnn, 1, ([X_train[:10], 0])[0]) np.savetxt('featuremap.txt', my_featuremaps)
The above code is generating the below error with TensorFlow as backend
TypeError: outputs of a TensorFlow backend function should be a list or tuple.
Actually, this works fins with theano as backend
@vinayakumarr I think it's a version problem between Keras 1.x and 2.x in network definition. Check the version of TF and Keras.
@JiyuZhangBH I am facing the same problem as you. LSTM object has no attribute 'b_i and also no bias_i Were you able to solve that problem of yours. Kindly let me know how you solved this issue. Thanks in advance.
@philipperemy on checking the latest recurrent.py in keras no step function definition could be found. Any help in this regard would be requested since for the above code given by you, even bias_i is not working anymore.
@elhenceashima if you still have this problem
Keras 2.1.2. This is (what I think is) the updated version of @philipperemy code above that worked for me:
for layer in model.layers:
if 'LSTM' in str(layer):
weights = layer.get_weights()
print('Previous memory state states[0] = {}'.format(K.get_value(layer.states[0])))
print('Previous carry state states[1] = {}'.format(K.get_value(layer.states[1])))
print('Input')
print('bias_i = {}'.format(K.get_value(layer.cell.bias_i)))
print('kernel_i = {}'.format(K.get_value(layer.cell.kernel_i)))
print('recurrent_kernel_i = {}'.format(K.get_value(layer.cell.recurrent_kernel_i)))
print('Forget')
print('bias_f = {}'.format(K.get_value(layer.cell.bias_f)))
print('kernel_f = {}'.format(K.get_value(layer.cell.kernel_f)))
print('recurrent_kernel_f = {}'.format(K.get_value(layer.cell.recurrent_kernel_f)))
print('Cell')
print('bias_c = {}'.format(K.get_value(layer.cell.bias_c)))
print('kernel_c = {}'.format(K.get_value(layer.cell.kernel_c)))
print('recurrent_kernel_c = {}'.format(K.get_value(layer.cell.recurrent_kernel_c)))
print('Output')
print('bias_o = {}'.format(K.get_value(layer.cell.bias_o)))
print('kernel_o = {}'.format(K.get_value(layer.cell.kernel_o)))
print('recurrent_kernel_o = {}'.format(K.get_value(layer.cell.recurrent_kernel_o)))
Check that you have set use_bias=True in the LSTM layer if you want to be able to get the biases.
Alternatively, one can use layer.get_weights() to get the same results, as used in [https://github.com/keras-team/keras/issues/3088]
for layer in model.layers:
if 'LSTM' in str(layer):
weights = layer.get_weights()
for e in zip(layer.trainable_weights, layer.get_weights()):
print('Param\n%s:\n%s' % (e[0], e[1]))
i.e. With the first implementation you get:
Input bias_i = [-0.05891408 -0.01085878] kernel_i = [[-0.19997048 0.7560412 ]] recurrent_kernel_i = [[ 0.03347364 0.69555515] [-0.05277003 0.16991724]] Forget bias_f = [0.99022233 0.99371547] kernel_f = [[-0.73273 0.46553144]] recurrent_kernel_f = [[-0.430623 0.53221387] [ 0.69555295 0.36536166]] Cell bias_c = [-0.05549829 -0.00565466] kernel_c = [[ 0.6521949 -0.36162028]] recurrent_kernel_c = [[ 0.07719237 0.15796313] [-0.17077188 0.20027691]] Output bias_o = [-0.05292185 -0.02267806] kernel_o = [[ 0.3093674 -0.61787367]] recurrent_kernel_o = [[-0.10132004 -0.16078967] [-0.3530063 0.39258108]]
while with the second one: Param <tf.Variable 'lstm_1/kernel:0' shape=(1, 8) dtype=float32_ref>: [[-0.19997048 0.7560412 -0.73273 0.46553144 0.6521949 -0.36162028 0.3093674 -0.61787367]] # these are just kernel_i, kernel_f, kernel_c and kernel_o in order Param <tf.Variable 'lstm_1/recurrent_kernel:0' shape=(2, 8) dtype=float32_ref>: [[ 0.03347364 0.69555515 -0.430623 0.53221387 0.07719237 0.15796313 -0.10132004 -0.16078967] [-0.05277003 0.16991724 0.69555295 0.36536166 -0.17077188 0.20027691 -0.3530063 0.39258108]] Param <tf.Variable 'lstm_1/bias:0' shape=(8,) dtype=float32_ref>: [-0.05891408 -0.01085878 0.99022233 0.99371547 -0.05549829 -0.00565466 -0.05292185 -0.02267806]
@MirunaPislar thanks for updating my code. I'm happy it works for you!
I know this is closed but I still get an error when trying to access the weights and bias. I've tried 'b_i' as well as 'bias_i'
print(type(model.layers[1]))
print(type(model.layers[1].layer))
K.get_value(model.layers[1].layer.bias_i)
<class 'keras.layers.wrappers.Bidirectional'>
<class 'keras.layers.recurrent_v2.LSTM'>
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/tmp/ipykernel_206688/2076287561.py in <module>
1 print(type(model.layers[1]))
2 print(type(model.layers[1].layer))
----> 3 K.get_value(model.layers[1].layer.bias_i)
AttributeError: 'LSTM' object has no attribute 'bias_i'
Hello, is there any tool (or intention to add it) to visualize LSTM/GRU layer activations (not only outputs, also gates), similar to what they do in: http://research.microsoft.com/pubs/257676/SentenceEmbedding1502.06922v2.pdf ?