Closed ghost closed 6 years ago
Hello,
the warning happens because you call de.explain()
with input and output tensors of the original model
instead of those of fModel
.
I would try (not tested) with:
attributions = de.explain('elrp', fModel.outputs[0] * ys, fModel.inputs[0], embedding_out)
However you also need to change how fModel
is defined because its input layer has to be embedding_tensor
. You might need to compile the model as well (call fModel.compile()
with the sames params of model
).
One more important thing, not related with the issue: you are taking the output of the softmax, not the one pre-softmax. This is because model.layers[-1]
is the last layer: Dense(class_number, activation='softmax')
, which includes the activation.
If you want the output pre-softmax you need to split your last layer of the model into two layers, then pick the second last:
model.add(Dense(class_number, activation='linear'));
model.add(Activation('softmax'));
...
pre_softmax_tensor = model.layers[-2].output
Please let me know if this helps.
Thank you for your help! I changed the model like this:
model = Sequential();
....
#model.add(Dense(4, activation='softmax'));
model.add(Dense(4, activation='linear'));
model.add(Activation('softmax'));
model.compile(....);
and I get the pre softmax reference like this now:
pre_softmax_tensor = model.layers[-2].output;
So far so good. When I create the new model with the embedding tensor though, I get an error that the input I gave to this model is not of an appropriate type:
#fModel = Model(inputs=input_tensor, outputs = model.layers[-2].output);
fModel = Model(inputs=embedding_tensor, outputs = model.layers[-2].output);
TypeError: Input layers to a
Model
must beInputLayer
objects. Received inputs: Tensor("embedding_6/Gather:0", shape=(?, 100, 32), dtype=float32). Input 0 (0-based) originates from layer typeEmbedding
and this is how the sensors look like:
input_tensor --- Tensor("embedding_6_input:0", shape=(?, 100), dtype=int32) embedding_tensor --- Tensor("embedding_6/Gather:0", shape=(?, 100, 32), dtype=float32) pre_softmax_tensor --- Tensor("dense_10/BiasAdd:0", shape=(?, 4), dtype=float32)
Regarding the explain()
call, I changed the tensor according to your suggestions:
new_pre_softmax_tensor = fModel.outputs[0]; # not same with fModel.layers[-2].output;
new_input_tensor = fModel.layers[0].input; # same tensor with fModel.inputs[0]
print("new_pre_softmax_tensor: {}".format(new_pre_softmax_tensor));
print("new_input_tensor: {}".format(new_input_tensor));
new_pre_softmax_tensor: Tensor("dense_10/BiasAdd:0", shape=(?, 4), dtype=float32) new_input_tensor: Tensor("embedding_6_input:0", shape=(?, 100), dtype=int32)
and unfortunately when I use them, I get a ValueError:
#attributions = de.explain('elrp', pre_softmax_tensor * ys, embedding_tensor, embedding_out)
attributions = de.explain('elrp', new_pre_softmax_tensor * ys, new_input_tensor, embedding_out);
ValueError: Dimensions must be equal, but are 100 and 4 for 'mul_188' (op: 'Mul') with input shapes: [?,100], [96,4].
and these are the shapes that are responsible for the ValueError:
pre_softmax_tensor ys shape --- (96, 4) new_pre_softmax_tensor ys shape --- (96, 4) embedding_tensor shape --- (?, 100, 32) embedding_out shape --- (96, 100, 32) attributions shape --- (96, 100, 32)
Right, the input to a Keras model cannot be a TF Tensor. So what about defining fModel
as follows:
fModel = Model(inputs= model.inputs, outputs = model.layers[-2].output);
and using
new_input_tensor = fModel.layers[0].output # < notice I take the output of first layer, so now output of the embedding
attributions = de.explain('elrp', new_pre_softmax_tensor * ys, new_input_tensor, embedding_out);
In other words, define a model with the same input of the original model and with output the pre-softmax layer, and then call explain using the output of the first layer (ie. the embedded representation of the input) as input tensor for DeepExplain.
If you try this change first, then we can look at the shape mismatch problem.
I just included the changes and I get a ValueError (None values not supported) when I call explain().
That's the current deep explain code:
with DeepExplain(session=current_session) as de: # <-- init DeepExplain context
# Get input tensor
input_tensor = model.layers[0].input;
print("input_tensor --- {}".format(input_tensor));
# Get embedding tensor
embedding_tensor = model.layers[0].output;
print("embedding_tensor --- {}".format(embedding_tensor));
# Get tensor before the final activation
pre_softmax_tensor = model.layers[-2].output;
print("pre_softmax_tensor --- {} ".format(pre_softmax_tensor));
# Create model until before softmax
fModel = Model(inputs= model.inputs, outputs = model.layers[-2].output);
fModel.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy']);
print(fModel.summary());
new_pre_softmax_tensor = fModel.outputs[0]; # not same with fModel.layers[-2].output;
new_input_tensor = fModel.layers[0].output # < notice I take the output of first layer, so now output of the embedding
# Evaluate the embedding tensor on the model input (in other words, perform the lookup)
embedding_out = current_session.run(embedding_tensor, {input_tensor: X_test});
xs = X_test;
ys = y_test;
# Run DeepExplain with the embedding as input
attributions = de.explain('elrp', new_pre_softmax_tensor * ys, new_input_tensor, embedding_out);
print("attributions shape --- {}".format(attributions.shape));
and that's the error (which looks similar to this):
ValueError Traceback (most recent call last)
<ipython-input-11-a809f4474611> in <module>()
55 #attributions = de.explain('elrp', pre_softmax_tensor * ys, embedding_tensor, embedding_out)
56 #attributions = de.explain('elrp', new_pre_softmax_tensor * ys, new_input_tensor, embedding_out);
---> 57 attributions = de.explain('elrp', new_pre_softmax_tensor * ys, new_input_tensor, embedding_out);
58 print("attributions shape --- {}".format(attributions.shape));
59
~/projects/nn-models/deepexplain-copy/deepexplain/tensorflow/methods.py in explain(self, method, T, X, xs, **kwargs)
455 _ENABLED_METHOD_CLASS = method_class
456 method = _ENABLED_METHOD_CLASS(T, X, xs, self.session, self.keras_phase_placeholder, **kwargs)
--> 457 result = method.run()
458 if issubclass(_ENABLED_METHOD_CLASS, GradientBasedMethod) and _GRAD_OVERRIDE_CHECKFLAG == 0:
459 warnings.warn('DeepExplain detected you are trying to use an attribution method that requires '
~/projects/nn-models/deepexplain-copy/deepexplain/tensorflow/methods.py in run(self)
122
123 def run(self):
--> 124 attributions = self.get_symbolic_attribution()
125 results = self.session_run(attributions, self.xs)
126 return results[0] if not self.has_multiple_inputs else results
~/projects/nn-models/deepexplain-copy/deepexplain/tensorflow/methods.py in get_symbolic_attribution(self)
244 return [g * x for g, x in zip(
245 tf.gradients(self.T, self.X),
--> 246 self.X if self.has_multiple_inputs else [self.X])]
247
248 @classmethod
~/projects/nn-models/deepexplain-copy/deepexplain/tensorflow/methods.py in <listcomp>(.0)
242
243 def get_symbolic_attribution(self):
--> 244 return [g * x for g, x in zip(
245 tf.gradients(self.T, self.X),
246 self.X if self.has_multiple_inputs else [self.X])]
~/anaconda3/envs/nn-models-env/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py in r_binary_op_wrapper(y, x)
907 def r_binary_op_wrapper(y, x):
908 with ops.name_scope(None, op_name, [x, y]) as name:
--> 909 x = ops.convert_to_tensor(x, dtype=y.dtype.base_dtype, name="x")
910 return func(x, y, name=name)
911
~/anaconda3/envs/nn-models-env/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in convert_to_tensor(value, dtype, name, preferred_dtype)
834 name=name,
835 preferred_dtype=preferred_dtype,
--> 836 as_ref=False)
837
838
~/anaconda3/envs/nn-models-env/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in internal_convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, ctx)
924
925 if ret is None:
--> 926 ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
927
928 if ret is NotImplemented:
~/anaconda3/envs/nn-models-env/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py in _constant_tensor_conversion_function(v, dtype, name, as_ref)
227 as_ref=False):
228 _ = as_ref
--> 229 return constant(v, dtype=dtype, name=name)
230
231
~/anaconda3/envs/nn-models-env/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py in constant(value, dtype, shape, name, verify_shape)
206 tensor_value.tensor.CopyFrom(
207 tensor_util.make_tensor_proto(
--> 208 value, dtype=dtype, shape=shape, verify_shape=verify_shape))
209 dtype_value = attr_value_pb2.AttrValue(type=tensor_value.tensor.dtype)
210 const_tensor = g.create_op(
~/anaconda3/envs/nn-models-env/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py in make_tensor_proto(values, dtype, shape, verify_shape)
369 else:
370 if values is None:
--> 371 raise ValueError("None values not supported.")
372 # if dtype is provided, forces numpy array to be the type
373 # provided if possible.
ValueError: None values not supported.
Hello, I feel I need to try myself, cause I cannot immediately see the problem. Could you please send me a minimal working model together with some data to try it out?
I also had some difficulty recreating the graph correctly in the DeepExplain context. I will have to think about it, in the meanwhile I suggest to use a single model, create and train it within the DeepExplain context:
with DeepExplain(session=current_session) as de: # <-- init DeepExplain context
model = Sequential();
model.add(Embedding(input_dim=4218+1, output_dim=32, input_length=100)); # input_length=29;, input_dim=max_words
model.add(Flatten());
model.add(Dense(100, activation='relu')); # input_shape=(max_words,)
model.add(Dropout(0.5));
#model.add(Dense(4, activation='softmax'));
model.add(Dense(4, activation='linear'));
model.add(Activation('softmax'));
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy']);
model.summary();
model.fit(X_train, y_train,
batch_size=32,
epochs=5,
validation_data=(X_test, y_test),
verbose=1,
shuffle=True);
# predict on test data
y_pred = model.predict(np.array(X_test));
y_test = np.array(y_test);
# Evaluate the embedding tensor on the model input (in other words, perform the lookup)
embedding_tensor = model.layers[0].output
input_tensor = model.inputs[0]
embedding_out = current_session.run(embedding_tensor, {input_tensor: X_test});
xs = X_test;
ys = y_test;
# Run DeepExplain with the embedding as input
attributions = de.explain('elrp', model.layers[-2].output * ys, model.layers[1].input, embedding_out);
print("attributions shape --- {}".format(attributions.shape));
Hi guys, thanks for starting this thread. So, I have been looking into a similar problem and above solution was very helpful in resolving some of the confusion. Here is a working code,
from keras.preprocessing import sequence
from keras.models import Sequential, Model, load_model, model_from_yaml
from keras.layers import Dense, Embedding
from keras.layers import LSTM
from keras.layers import Dense, Dropout, Flatten, Activation
from keras import backend as K
from keras.datasets import imdb
import numpy as np
max_features = 20000 # cut-off for the number of unique words in the corpus
maxlen = 80 #among top max_features most common words
batch_size = 32
print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')
print('Pad sequences (samples x time)')
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)
print('Build model...')
model = Sequential()
model.add(Embedding(max_features, output_dim=128))
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1))
model.add(Activation('softmax'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
print('Train...')
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=15,
validation_data=(x_test, y_test))
loaded_model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
score, acc = loaded_model.evaluate(x_test, y_test,
batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)
# Save and persist the trained model
model_yaml = model.to_yaml()
with open("model_lstm.yaml", "w") as yaml_file:
yaml_file.write(model_yaml)
# serialize weights to HDF5
model.save_weights("model_lstm.h5")
print("model persisted on disk")
from deepexplain.tensorflow import DeepExplain
with DeepExplain(session=K.get_session()) as de:
# load YAML and create model
yaml_file = open('model_lstm_sigmoid.yaml', 'r')
loaded_model_yaml = yaml_file.read()
yaml_file.close()
loaded_model = model_from_yaml(loaded_model_yaml)
# load weights into new model
loaded_model.load_weights("model_lstm_sigmoid.h5")
print("Loaded model from disk")
uploaded_model = loaded_model
input_tensor = uploaded_model.layers[0].input
xs = np.array([x_test[1]])
ys = np.array([y_test[1]])
print('Predicted class : {}'.format(uploaded_model.predict(np.array([x_test[0]]))))
print('Ground Truth: {}'.format(ys))
embedding_tensor = uploaded_model.layers[0].output
input_tensor = uploaded_model.layers[0].input
embedding_out = de.session.run(embedding_tensor, {input_tensor: xs});
print(embedding_out.shape)
attributions = de.explain('elrp', uploaded_model.layers[-2].output * ys,
uploaded_model.layers[1].input, embedding_out);
Thanks for sharing the code. Unfortunately this does not work correctly. If you see, upon calling de.explain()
the following warning is displayed:
UserWarning: DeepExplain detected you are trying to use an attribution method that requires gradient override but the original gradient was used instead. You might have forgot to (re)create your graph within the DeepExlain context. Results are not reliable!
This happens because you did not create (or recreated) the graph within the DeepExplain context. This will work for Gradient*Input, Integrated Gradients and Occlusion but results of LRP and DeepLIFT (that require gradient overriding) are just wrong. In any case, this version of LRP cannot be applied to LSTM units, even with a correct implantation, so please consider using another method.
Oh yes, that's right. Thanks for catching that. Also, agreed on this version of LRP not being the right choice for LSTM. I have updated the code One can build a model and then persist it outside the DeepExplain context. Then the model can be loaded into context and the relevant algorithm(Integrated Gradient or) could be applied.
Hi all and thank you very much Marcoancona for providing your implementation to explain NNs! It's very valuable. I am currently working on text classification and I would like to understand which words contributed to the decision of my classifier. As there is no NLP example in this project, I followed your pseudocode and guidelines and I wrote the following code to classify quotations extracted from 4 UK newspapers into the original news sources. In this sample dataset there are only 500 quotes in total and 4 classes (newspapers). I uploaded the data as numpy arrays here. The data is already preprocessed, tokenized, transformed into vectors and padded.
Given the code I share below, my question is why do I get the "You might have forgot to (re)create your graph within the DeepExlain context" warning, even though I reconstruct the model in the deepExplain context? As I am a relative beginner in Keras, I am also unsure whether the code inside the DeepExplain corresponds to the pseudocode you provided. Lastly, I didn't get how to find the attributions per word as you described it. I am not sure what to sum and also how to find the initial words (not the vectors) that the attributions correspond to. I appreciate any hint! Thanks a lot PS. The model performs poorly, but it's just a toy example to get familiar with DeepExplain and Keras