marcoancona / DeepExplain

A unified framework of perturbation and gradient-based attribution methods for Deep Neural Networks interpretability. DeepExplain also includes support for Shapley Values sampling. (ICLR 2018)
https://arxiv.org/abs/1711.06104
MIT License
725 stars 133 forks source link

Visual Interpretation on Attributions -- text classifications with LRP #20

Open looraL opened 6 years ago

looraL commented 6 years ago

Hi!

Thanks for sharing your codes and explanations. Those were extremely helpful.

I am a beginner with keras and sentiment analysis, and currently training a CNN model to classify some textual data. To embark on the next stage, it would be great to get an idea how each word in a sample contributes to the final classification. And this is very similar to the "LPR application on IMDB dataset" in your paper and the demo on http://www.heatmapping.org/

At this point, I have got the "explain" method working, but I am stuck with the interpretations and visualizations. Ideally, we could map the attributes output to the original texts and create a heatmap on the vocabulary. My attributes output is of size(6, 150, 50), where 6 is the sample size, 150 is the sequence length, and 50 is the embedding dimension.

Any suggestion would be appreciated!

Here is an outline of my code:

NUM_CATEGORY = 4
MAX_SEQUENCE_LENGTH = 150
MAX_NB_WORDS = 20000 # number of words in vocabulary
EMBEDDING_DIM = 50
VALIDATION_SPLIT = 0.333

# feed in preprocessed data
embedding_layer = Embedding(len(token_index) + 1,
                                EMBEDDING_DIM,    # 50
                                weights=[embedding_matrix],  # pretrained embedding matrix
                                input_length=MAX_SEQUENCE_LENGTH,
                                trainable=True, name='embedding')

sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH, ), dtype='int32', name='input_x')
embedded_sequences = embedding_layer(sequence_input)

l_conv = Conv1D(nb_filter=100,filter_length=3, kernel_regularizer=l2(0.001))(embedded_sequences)
l_actv = Activation('relu')(l_conv)
l_dropout = Dropout(0.5)(l_actv) 
l_pool = MaxPooling1D(5)(l_dropout)
l_flat = Flatten()(l_pool)
l_dense = Dense(50, W_regularizer=l2(0.05))(l_flat)
l_actv1 = Activation('relu')(l_dense)
l_dropout2 = Dropout(0.2)(l_actv1) 
l_dense2 = Dense(4, W_regularizer=l2(0.005), name='dense2')(l_dropout2)
pred = Activation('softmax')(l_dense2)
model = Model(inputs= sequence_input, outputs=pred)    

model.compile(loss='categorical_crossentropy',
                  optimizer='rmsprop',
                  metrics=['accuracy'])    
callbacks = [
            EarlyStopping(monitor='val_loss', patience=5, verbose=0),
            ModelCheckpoint('model-{epoch:03d}.h5', verbose=1, monitor='val_loss',save_best_only=True, mode='auto') 
            ]
history = model.fit(x_train, y_train, validation_data=(x_test, y_test),
                  batch_size=32, epochs = 100, callbacks=callbacks)
score, acc = model.evaluate(x_test, y_test, batch_size=32)

with DeepExplain(session=K.get_session()) as de:  

         model = load_model('model-036.h5')

         # reference to tensors
         input_tensor = model.get_layer("input_x").input
         embedding = model.get_layer("embedding").output
         pre_softmax = model.get_layer("dense2").output

         # sample
         x_interpret = x_test[9:15]        
         # perform lookup
         get_embedding_output = K.function([input_tensor],[embedding])
         embedding_out = get_embedding_output([x_interpret])[0]        

         # target the output of the last dense layer (pre-softmax)
         fModel = Model(inputs=input_tensor, outputs = pre_softmax)
         target_tensor = fModel(input_tensor)

         # to target a specific neuron(class), we apply a binary map
         ys = [1, 0, 0, 0]

         # np-array of size (6, 150, 50)
         attributions = de.explain('elrp', pre_softmax*ys, embedding, embedding_out)
marcoancona commented 6 years ago

Hi there! What is normally done for NLP, is to sum up attributions over the embedding dimensions. In your case, you would do np.sum(attributions, -1) and end up with an array of size (6, 150). Now you have a score for each word, that you can visualize as you prefer.

looraL commented 6 years ago

Hi Marco, thanks for helping out!

I have applied the sum-up to the output attributes matrix. The problem the number of nonzeros for each sample does not align with the number of words contained in it. Does it make sense?

For example, one pre-embedding word list is cleaned into a 27-word list, with paddings to the right. But its attributes weight with 98 nonzero values does not correspond to the word list. I was expecting 27 nonzero values in its summed up attributes weight with paddings to the right. Is there anything wrong with my understanding/implementation?

Many thanks!

Sample output for the attributes weight (150, 1): [0.0047955 -0.0515381 -0.0528361 -0.195269 -0.0927481 -0.0920452 -0.0133876 0.00504693 -0.0455068 0.0607065 0.0758778 -0.0437094 0.105729 0.160636 -0.0028076 -0.117737 0.00311046 0.194461 0.135671 0.110874 0.0623835 0.117782 0.0749264 0.0127667 0.1243 0.0727738 -0.0178832 -0.000369398 0.000776381 0 0.000122282 0.00010872 -7.24178e-06 0 0 0.000255847 0.000227472 -1.51518e-05 0 0 0.000161644 0.000143716 -9.57289e-06 0 0 0.000129164 0.000114838 -7.64935e-06 0 0 9.28584e-05 8.25596e-05 -5.49928e-06 0 0 3.7087e-05 3.29738e-05 -2.19637e-06 0 0 0.000121541 0.000108061 -7.19789e-06 0 0 -4.44104e-05 -3.94849e-05 2.63008e-06 0 0 -1.30395e-05 -1.15933e-05 7.72226e-07 0 0 2.89512e-05 2.57402e-05 -1.71455e-06 0 0 -4.49442e-05 -3.99595e-05 2.66169e-06 0 0 -1.60106e-05 -1.42349e-05 9.4818e-07 0 0 -1.14543e-06 -1.01839e-06 6.78348e-08 0 0 -8.79873e-07 -7.82288e-07 5.21079e-08 0 0 -2.24294e-05 -1.99418e-05 1.32831e-06 0 0 -4.43908e-05 -3.94675e-05 2.62892e-06 0 0 -3.49119e-05 -3.10399e-05 2.06756e-06 0 0 6.78908e-06 6.03612e-06 -4.02063e-07 0 0 6.4125e-06 5.7013e-06 -3.79762e-07 0 0 6.68977e-06 5.94782e-06 -3.96183e-07 0 0 5.62173e-06 4.99824e-06 -3.32931e-07 0 0 5.20878e-06 4.63108e-06 -3.08475e-07 0 0 6.9628e-06 6.19057e-06 -4.12352e-07 0 0 0 0 0 0 0]

SamuelXie commented 5 years ago

What is normally done for NLP, is to sum up attributions over the embedding dimensions.

@marcoancona Could you please provide relevant papers? Thx!