A unified framework of perturbation and gradient-based attribution methods for Deep Neural Networks interpretability. DeepExplain also includes support for Shapley Values sampling. (ICLR 2018)
Attributions for multiple neurons in target tensor are not equal to average attributions per neuron #58

LeaE commented 4 years ago

I have trained a regression network with multiple output neurons and want to compute attributions for each output neuron separately. I followed the instructions provided in Which neuron to target? to create a mask ys. However, I realized that the attribution scores for each neuron were much smaller (about range ± 0.1) than the attributions I get when running the explainer on the whole output tensor (about range ± 10). This did not make sense to me, as Which neuron to target? states

Tensors in Tensorflow and Keras usually include the activations of all neurons of a layer. If you pass such a tensor to explain you will get the average attribution map for all neurons the Tensor refers to.

If I get the average of attributions for all neurons, then the magnitude of attributions for each neuron should be similar to the magnitude across all outputs. I tried to analyze the problem by summing up the attributions for each output neurons. In fact, the sum of attributions is equal to the attributions I get when running explain on all outputs. Below is the code I used:

with DeepExplain(session=K.get_session()) as de:
    # get input and target tensor
    input_tensor = model.layers[0].input
    fModel = Model(inputs=input_tensor, outputs=model.outputs)
    target_tensor = fModel(input_tensor)

    # create explainer object
    explainer = de.get_explainer('grad*input', T=target_tensor, X=input_tensor)

    # [Part 1] calculate attributions based on all neurons in output layer
    all_attributions =

    # [Part 2] calculate attributions for each output neuron separately

    # ndarray that sums up attributions for each output neuron computed in the for-loop
    summed_attributions = np.zeros((num_samples, num_inputs))

    # iterate over output nodes
    for index in range(0, num_outputs):
        # get current mask (one output has weight 1, the others have weight 0)
        current_ys = np.zeros((num_samples, num_outputs))
        current_ys[:, index] = 1

        # get attributions for current output neuron
        # inputs_test is an ndarray of shape (num_samples, num_inputs)
        current_attributions =, ys=current_ys)
        # sum up attributions per output neuron
        summed_attributions = np.add(summed_attributions, current_attributions)

    print("Attributions for all outputs:\n", all_attributions)
    print("Summed Attributions per output neuron:\n", summed_attributions)

The printed output I get is:

Attributions for all outputs:
 [[ 1.7566239   5.5694366  -1.3839844  ... -1.6561449  -4.519706
  -1.2594965 ]
 [-0.36041573  0.92507845 -2.6454725  ... -2.864266   -1.2477317
  -3.98654   ]
 [ 1.0229955   2.8411934  -2.0691242  ... -4.0847306  -2.4401922
  -5.1763616 ]
 [ 0.87816465  3.2128146  -2.68291    ... -4.67921    -0.06084266
  -4.535814  ]
 [ 0.78167516  1.8168001  -5.0825305  ... -7.9030166   1.2522012
  -8.19217   ]
 [ 0.3669984   1.9208958  -1.9309239  ... -0.7711489  -2.6113365
  -6.4912357 ]]
Summed Attributions per output neuron:
 [[ 1.75662394  5.56943862 -1.38398494 ... -1.65614253 -4.5197059
 [-0.36041581  0.92507885 -2.64547465 ... -2.86426684 -1.2477314
 [ 1.02299545  2.84119363 -2.06912101 ... -4.08473065 -2.44019283
 [ 0.87816471  3.21281424 -2.68290941 ... -4.67921036 -0.06084232
 [ 0.78167511  1.81679968 -5.082533   ... -7.90301781  1.25220258
 [ 0.36699789  1.92089703 -1.9309225  ... -0.77114941 -2.61133606
  -6.4912362 ]]

As you can see, both outputs are identical. To me this would indicate that either a) the attributions for all outputs are not in fact averaged but a summation, or b) the attributions per neuron are averaged for some reason.

So my questions are:

Thanks a lot for any help!

(Some additional info: my network has 327 input neurons, 111 output neurons and the number of samples I use in xs is 70)

marcoancona commented 4 years ago

Thanks for pointing out. Indeed it produces the sum, not the average.