A unified framework of perturbation and gradient-based attribution methods for Deep Neural Networks interpretability. DeepExplain also includes support for Shapley Values sampling. (ICLR 2018)
I have trained a regression network with multiple output neurons and want to compute attributions for each output neuron separately. I followed the instructions provided in Which neuron to target? to create a mask ys. However, I realized that the attribution scores for each neuron were much smaller (about range ± 0.1) than the attributions I get when running the explainer on the whole output tensor (about range ± 10). This did not make sense to me, as Which neuron to target? states
Tensors in Tensorflow and Keras usually include the activations of all neurons of a layer. If you pass such a tensor to explain you will get the average attribution map for all neurons the Tensor refers to.
If I get the average of attributions for all neurons, then the magnitude of attributions for each neuron should be similar to the magnitude across all outputs. I tried to analyze the problem by summing up the attributions for each output neurons. In fact, the sum of attributions is equal to the attributions I get when running explain on all outputs.
Below is the code I used:
with DeepExplain(session=K.get_session()) as de:
# get input and target tensor
input_tensor = model.layers[0].input
fModel = Model(inputs=input_tensor, outputs=model.outputs)
target_tensor = fModel(input_tensor)
# create explainer object
explainer = de.get_explainer('grad*input', T=target_tensor, X=input_tensor)
# [Part 1] calculate attributions based on all neurons in output layer
all_attributions = explainer.run(xs=inputs_test)
# [Part 2] calculate attributions for each output neuron separately
# ndarray that sums up attributions for each output neuron computed in the for-loop
summed_attributions = np.zeros((num_samples, num_inputs))
# iterate over output nodes
for index in range(0, num_outputs):
# get current mask (one output has weight 1, the others have weight 0)
current_ys = np.zeros((num_samples, num_outputs))
current_ys[:, index] = 1
# get attributions for current output neuron
# inputs_test is an ndarray of shape (num_samples, num_inputs)
current_attributions = explainer.run(xs=inputs_test, ys=current_ys)
# sum up attributions per output neuron
summed_attributions = np.add(summed_attributions, current_attributions)
print("Attributions for all outputs:\n", all_attributions)
print("Summed Attributions per output neuron:\n", summed_attributions)
As you can see, both outputs are identical. To me this would indicate that either a) the attributions for all outputs are not in fact averaged but a summation, or b) the attributions per neuron are averaged for some reason.
So my questions are:
Is my code correct or does it cause the issue?
Which attributions are on the correct scale now? The ones combining all neurons, or the ones per neuron?
Thanks a lot for any help!
(Some additional info: my network has 327 input neurons, 111 output neurons and the number of samples I use in xs is 70)
I have trained a regression network with multiple output neurons and want to compute attributions for each output neuron separately. I followed the instructions provided in Which neuron to target? to create a mask ys. However, I realized that the attribution scores for each neuron were much smaller (about range ± 0.1) than the attributions I get when running the explainer on the whole output tensor (about range ± 10). This did not make sense to me, as Which neuron to target? states
If I get the average of attributions for all neurons, then the magnitude of attributions for each neuron should be similar to the magnitude across all outputs. I tried to analyze the problem by summing up the attributions for each output neurons. In fact, the sum of attributions is equal to the attributions I get when running explain on all outputs. Below is the code I used:
The printed output I get is:
As you can see, both outputs are identical. To me this would indicate that either a) the attributions for all outputs are not in fact averaged but a summation, or b) the attributions per neuron are averaged for some reason.
So my questions are:
Thanks a lot for any help!
(Some additional info: my network has 327 input neurons, 111 output neurons and the number of samples I use in xs is 70)