marcoancona / DeepExplain

A unified framework of perturbation and gradient-based attribution methods for Deep Neural Networks interpretability. DeepExplain also includes support for Shapley Values sampling. (ICLR 2018)
https://arxiv.org/abs/1711.06104
MIT License
720 stars 133 forks source link

Attributions for multiple neurons in target tensor are not equal to average attributions per neuron #58

Closed LeaE closed 4 years ago

LeaE commented 4 years ago

I have trained a regression network with multiple output neurons and want to compute attributions for each output neuron separately. I followed the instructions provided in Which neuron to target? to create a mask ys. However, I realized that the attribution scores for each neuron were much smaller (about range ± 0.1) than the attributions I get when running the explainer on the whole output tensor (about range ± 10). This did not make sense to me, as Which neuron to target? states

Tensors in Tensorflow and Keras usually include the activations of all neurons of a layer. If you pass such a tensor to explain you will get the average attribution map for all neurons the Tensor refers to.

If I get the average of attributions for all neurons, then the magnitude of attributions for each neuron should be similar to the magnitude across all outputs. I tried to analyze the problem by summing up the attributions for each output neurons. In fact, the sum of attributions is equal to the attributions I get when running explain on all outputs. Below is the code I used:

with DeepExplain(session=K.get_session()) as de:
    # get input and target tensor
    input_tensor = model.layers[0].input
    fModel = Model(inputs=input_tensor, outputs=model.outputs)
    target_tensor = fModel(input_tensor)

    # create explainer object
    explainer = de.get_explainer('grad*input', T=target_tensor, X=input_tensor)

    # [Part 1] calculate attributions based on all neurons in output layer
    all_attributions = explainer.run(xs=inputs_test)

    # [Part 2] calculate attributions for each output neuron separately

    # ndarray that sums up attributions for each output neuron computed in the for-loop
    summed_attributions = np.zeros((num_samples, num_inputs))

    # iterate over output nodes
    for index in range(0, num_outputs):
        # get current mask (one output has weight 1, the others have weight 0)
        current_ys = np.zeros((num_samples, num_outputs))
        current_ys[:, index] = 1

        # get attributions for current output neuron
        # inputs_test is an ndarray of shape (num_samples, num_inputs)
        current_attributions = explainer.run(xs=inputs_test, ys=current_ys)
        # sum up attributions per output neuron
        summed_attributions = np.add(summed_attributions, current_attributions)

    print("Attributions for all outputs:\n", all_attributions)
    print("Summed Attributions per output neuron:\n", summed_attributions)

The printed output I get is:

Attributions for all outputs:
 [[ 1.7566239   5.5694366  -1.3839844  ... -1.6561449  -4.519706
  -1.2594965 ]
 [-0.36041573  0.92507845 -2.6454725  ... -2.864266   -1.2477317
  -3.98654   ]
 [ 1.0229955   2.8411934  -2.0691242  ... -4.0847306  -2.4401922
  -5.1763616 ]
 ...
 [ 0.87816465  3.2128146  -2.68291    ... -4.67921    -0.06084266
  -4.535814  ]
 [ 0.78167516  1.8168001  -5.0825305  ... -7.9030166   1.2522012
  -8.19217   ]
 [ 0.3669984   1.9208958  -1.9309239  ... -0.7711489  -2.6113365
  -6.4912357 ]]
Summed Attributions per output neuron:
 [[ 1.75662394  5.56943862 -1.38398494 ... -1.65614253 -4.5197059
  -1.25949728]
 [-0.36041581  0.92507885 -2.64547465 ... -2.86426684 -1.2477314
  -3.98653975]
 [ 1.02299545  2.84119363 -2.06912101 ... -4.08473065 -2.44019283
  -5.17636197]
 ...
 [ 0.87816471  3.21281424 -2.68290941 ... -4.67921036 -0.06084232
  -4.53581512]
 [ 0.78167511  1.81679968 -5.082533   ... -7.90301781  1.25220258
  -8.19217148]
 [ 0.36699789  1.92089703 -1.9309225  ... -0.77114941 -2.61133606
  -6.4912362 ]]

As you can see, both outputs are identical. To me this would indicate that either a) the attributions for all outputs are not in fact averaged but a summation, or b) the attributions per neuron are averaged for some reason.

So my questions are:

Thanks a lot for any help!

(Some additional info: my network has 327 input neurons, 111 output neurons and the number of samples I use in xs is 70)

marcoancona commented 4 years ago

Thanks for pointing out. Indeed it produces the sum, not the average.